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ABSTRACT 


This  paper  describes  a  constant  percentage  bandwidth 
transform  for  acoustic  signal  processing.  Such  a  transform 
is  shown  to  emulate  behavior  found  in  the  human  auditory 
system,  making  possible  both  the  imitation  of  peripheral 
auditory  analysis,  and  processing  which  is  more  closely 
linked  to  perception  than  is  possible  using  constant 
bandwidth  analysis. 

To  enable  such  processing,  a  synthesis  transformation 
is  developed  which,  when  cascaded  with  the  analysis 
transformation,  provides  an  analysis-synthesis  identity  in 
the  absence  of  spectral  mod i f ication .  Various  properties 
of  the  transform  pair  are  derived,  and  a  filterbank  analogy 
is  used  to  create  a  basis  for  intuitive  understanding  of 
the  transform's  operation  and  properties. 

Hie  effects  of  spectral  domain  modification  are 
described  and  shown  to  be  related  to  the  properties  of  the 
analysis  window  function. 

Principles  governing  discrete  implementation  of  the 
transform  pair  are  discussed,  and  relationships  are 
formalized  which  specify  the  sampling  of  the  spectral 
domain.  These  relationships  are  shown  to  depend 


simultaneously  on  the  analysis  window 


function  and  the 


selectivity  (or  Q)  of  the  analysis.  An  alternative  form  of 
the  synthesis  is  given  which  facilitates  a  more  nearly 
optimal  logarithmic  sampling  of  the  spectral  frequency 
axis.  A  minimal  sampling  pattern  is  given  for  the  spectral 
domain  which  has  an  overall  rate  equivalent  to  the  rate 
necessary  to  sample  the  constant  bandwidth  spectral  domain. 

The  nature  and  computation  of  the  constant-Q  spectral 
magnitude  and  phase  functions  is  discussed,  and  three  main 
methods  are  evaluated  whereby  the  spectral  phase  may  be 
unwrapped . 

Fine  resolution  constant-Q  spectrograms  are  presented 
which  show  clearly  the  properties  of  constant-Q  analysis 
applied  to  speech. 

The  use  of  the  transform  pair  is  discussed  in  the 
solution  of  the  perception- related  problem  of  time  scale 
compression  and  expansion  of  speech.  Results  of  this 
experiment  are  discussed. 

Finally,  suggestions  for  further  research  and 
applications  are  presented. 
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CHAPTER  1 


INTRODUCTION 


1.1  Overv  iew 

The  usefulness,  in  signal  processing,  of 
transformations  which  produce  spectral  representations  of 
temporal  or  spatial  data  is  rooted  in  part  in  the 
underlying  physiological  processes  which  artificial  signal 
processing  attempts  to  imitate  or  augment.  This  is 
particularly  true  of  the  transforms  used  in  processing 
sound  signals.  Current  interest  in  the  short-time  Fourier 
transform,  for  example,  is  related  to  the  rough  analogy 
that  exists  between  the  short-time  spectral  domain  and  the 
real-time  analysis  performed  by  the  human  inner  ear. 
Because  the  information  obtained  from  the  short-time 
Fourier  transform  exists  in  a  format  related  to  the  format 
in  which  information  appears  to  emerge  from  the  inner  ear, 
intuitive  descriptions  of  signal  qualities  such  as  pitch, 
temporal  change,  amplitude  and  harmonic  content  can  easily 
be  related  to  the  properties  of  the  formal  mathematical 
representation.  Such  a  relationship  gives  insight  into 
both  the  underlying  physiological  processes  involved,  and 
into  artificial  processes  which  may  be  implemented  to 
affect  perception-related  changes. 


Properties  of  transformations  such  as  those  of  the 
short-time  Fourier  transform  which  relate  to  properties  of 
physical  systems  determine  the  appr opr i a teness  of  such 
transforms  as  models.  Clearly,  as  a  model's  properties 
more  completely  conform  to  the  properties  of  the  system 
which  it  attempts  to  emulate,  it  becomes  more  useful  as  a 
tool  for  discovery  of  further  system  properties,  and  for 
duplication  and  augmentation  of  processes  known  to  occur 
within  the  real  system. 

An  examination  of  the  properties  of  the  short-time 
Fourier  transform  as  a  model  of  the  hearing  process  will  be 
taken  up  in  the  following  section,  leading  to  the 
conclusion,  already  expressed  by  several  researchers,  that 
it  lacks  some  cha r ac te r i st ic s  essential  to  the  analysis 
performed  in  the  human  auditory  system.  Section  1.3  then 
presents  preliminary  evidence  that  a  constant  percentage 
bandwidth  (or  constant-Q)  transform  should  more  adequately 
model  the  human  auditory  system.  Finally,  Section  1.4 
describes  the  contribution  offered  by  this  work,  and 
outlines  the  remaining  chapters. 

1.2  Short-time  Fourier  Transform  Modelling  of 

Human  Auditory  Signal  Analysis 

A  complete  description  of  the  electrical  or  mechanical 
analogs  proposed  by  researchers  in  their  attempts  to 
partially  account  for  properties  of  the  peripheral  auditory 
system,  is  beyond  the  scope  of  this  work.  The  subject  is 


t  r  =wL  in'numerous  references  (1,2,31.  In  addition,  the 
details  of  the  physiological  system  or  of  its  analogs  are 
complex,  resisting  concise  mathematical  modelling,  and 
hence  have  not,  to  date,  been  useful  in  solving  the  usual 
signal  processing  problems  --  noise  removal,  parameter 
extraction  and  transmission,  etc..  A  mathematical 
transformation,  on  the  other  hand,  can  relate  more  easily 
to  these  problems.  One  such  transform,  the  short-time 
Fourier  transform,  the  properties  of  which  are 
well-understood,  provides  both  a  forward  and  a  reverse 
mapping  to  a  domain  which  resembles  the  analysis  domain  of 
the  ear.  However,  even  the  most  superficial  examination  of 
the  ear’s  physiology  reveals  weaknesses  in  the  short-time 
Fourier  transform  as  a  model  of  auditory  analysis.  The 
transform  conforms  to  the  ear-property  hypothesized  by 
Helmholtz  [4],  and  later  corroborated  by  Bekesy  [5]  and 
others,  wherein  spatially  selective  time-limited  frequency 
analysis  is  performed.  It  fails,  however,  to  emulate  other 
aspects  of  the  behavior  observed  by  Bekesy.  In  particular, 
Bekesy  observed  that  the  basilar  membrane  behaves  as  a 
non-uniform  or  dispersive  transmission  line  such  that  tones 
travel  a  distance  inversely  proportional  to  their  frequency 
where  they  are  sensed  and  then  are  rapidly  attenuated.  He 
further  observed  that  the  envelope  of  a  tone  traveling  the 
length  of  the  membrane  maintains  its  shape  as  it  moves  the 
35mm  to  the  apex  of  the  membrane.  In  other  words,  the 
mechanical  analysis  performed  by  the  inner  ear  was  reported 
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by  Bekesy  to  have  a  rather  low,  but-  constant  0.  (The 
selectivity,  Q,  of  an  instrument  is  a  measure  of  its 
ability  to  resolve  or  respond  to  a  particular  frequency 
component  independent  of  the  presence  of  nearby  spe~‘rel 
components.  Selectivity  is  formally  defined  as  *-he  ratio 
of  the  center  frequency  of  a  response  peak  to  the  -  i 
decibel  bandwidth  of  that  peak.)  Though  it  has  been 
extended  in  accuracy,  Bekesy's  basic  result  that 
frequencies  are  resolved  with  roughly  constant  selectivity 
at  positions  logarithmically  spaced  along  the  length  of  the 
basilar  membrane  still  seems  correct.  The  value  for  the  0 
of  the  ear's  analysis,  convergently  verified  by  recent 
experiments,  has  been  specified  by  Searle  [6].  He  gives 
the  resolution  as  roughly  one  third  of  an  octave  (a  Q  equal 
to  about  4.3).  As  described  in  Chapter  2,  the  short-time 
Fourier  transform  behaves  as  a  bank  of  equally  spaced, 
constant  bandwidth  filters.  Hence,  analysis  preformed  at 
high  frequencies  is  over-resolved  in  frequency  while  that 
performed  at  low  frequencies  may  be  under  resolved.  This 
difficulty  in  the  constant  bandwidth  short-time  Fourier 
transform  has  been  noted  by  Callahan  [7],  who  points  out 
that  for  speech  analysis,  the  window  length  is  a  compromise 
between  adequate  time  resolution  at  high  frequencies  and 
enough  frequency  resolution  at  low  frequencies.  In  speech 
processing  schemes  where  accurate  pitch  and  vocal  tract 
resonance  information  are  required  simultaneously,  the 
dilemma  is  insoluble,  and  separate  pitch  extraction  is 


ordinarily  necessitated.  Clearly,  const  an t- band wi d t h 
analysis  fails  in  this  respect  as  a  model  of  peripheral 
auditory  analysis.  The  list  of  other  ear  phenomena  not 
described  (at  least  not  trivially)  by  the  constant 
bandwidth  analysis  model  includes  Ta  r  t  i  n  i '  s  combination 
tones,  Seebuck  and  Schouten's  residue  pitch,  Sachs  and 
King's  two  tone  suppression  and  many  other  phenomena.  The 
above  phenomena  are  complex  and  have  been  described  by 
Searle  [6]  as  having  second  order  importance  in  initial 
efforts  to  model  the  ear  via  mathematical  tr ansfo rmat ions . 

Despite  the  failure  of  the  short-time  Fourier 
transform  to  model  the  essentially  constant-Q  nature  of 
analysis  performed  by  the  ear,  its  two  dimensional  nature 
has  proven  useful  in  many  applications.  Among  these  are 
the  phase- vocoder  [8,9,10],  perceptual  rate  change  proposed 
by  Flanagan  [1]  and  recently  implemented  by  Portnoff  [11], 
and  various  two-dimensional  modification  experiments 
involving  noise  removal,  feature  isolation  and  enhancement 
and  bandwidth  compression  all  performed  by  Callahan  [7]. 
Both  Portnoff  and  Callahan  noted  the  limitations  imposed  by 
the  constant  bandwidths  in  their  experiments,  and  pointed 
out  the  possible  advantage  inherent  to  a  constant-Q 
implementation  of  their  systems. 

1.3  The  Constant-Q  Alternative 

The  notion  of  constant-Q  signal  analysis  is  not  new. 
The  analog  spectral  analyzer  has  been  performing  constant 


percentage  bandwidth  analysis  for  decades.  That  constant-Q 
analysis  could  be  formalized  ma  them  a  t  ic  al  1  y  was  recognized 
in  1971  by  Gambardella  [17],  who  proposed  a  "multiple 
filter  analyzer  integral." 

F  ( ui ,  t )  =  J  f  (t  )  h  ( t*  t ,  »i)  e  Jujr  dr  (1.1) 

(Note  that  integration  intervals  for  all  integrals  in  this 
work  are  assumed  to  be  (-00,00)  unless  otherwise  stated.’ 
This  analysis  integral  is  a  generalization  of  the 
short-time  Fourier  integral  transform  in  the  sense  that  its 
analysis  window  is  a  function  not  only  of  time^ftut  also  of 
analysis  frequency.  Gambardella  pointed  out  that  certain 
forms  of  this  integral  function  permit  a  reverse  transform, 
and  that  one  particular  form  exhibits  constant-Q  character. 

Related  efforts  directed  at  the  problem  of  constant-Q 
signal  analysis  have  centered  attention  on  the  notions  of 
warping  the  frequency  axis  and  non-uniform  sampling  of  the 
z-transform.  These  efforts  are  reviewed  in  Chapter  3. 

1.4  Contribution  and  Outline  of  this  Work 

The  contribution  of  this  work  involves  fo  rmal  i  za  t  ion 
of  a  constant-Q  transform  and  the  definition  of  the 
properties  of  the  transform  as  it  pertains  to  acoustic 
signal  processing.  Attention  has  been  given  to 
mathematical  establishment  of  both  the  properties  of  the 
forward  transform  and  and  reverse  transform  so  that 
processing  which  uses  the  transform  may  be  well-understood. 


The  effect  of  spectral  domain  modification,  an  important 
issue  where  signal  processing  is  the  goal  of  analysis,  has 
been  discovered  and  described.  Because,  to  date,  no  form 
of  the  analysis  transform  analogous  in  speed  and  elegance 
to  the  FFT  has  been  found,  care  has  been  taken  to  derive 
and  articulate  relationships  governing  the  sampling  of  the 
constant-Q  spectral  domain.  A  pattern  which  allows  minimal 
sampling  has  been  described,  and  an  algorithm  presented 
whereby  sampled  analysis  may  be  achieved  at  the  expense  of 
a  complex  demodulation  and  fast  convolution  on  each 
analysis  channel  .  The  nature  and  computation  of  the 
spectral  magnitude  and  phase  functions  has  been  discussed, 
including  the  problem  of  spectral  phase  unwrapping.  As  an 
illustration  of  the  use  of  the  transform  pair,  the 
pe  r  cept  i  on- r  el  a  ted  problem  of  time  compression  and 
expansion  of  speech  was  solved  using  the  tr ansfo  rm  ,and  the 
performance  of  the  algorithm  in  this  application  evaluated. 

Chapter  2  contains  a  discussion  of  generalized 
short-time  Fourier  transform  analysis  and  synthesis,  as 
well  as  a  discussion  of  the  effects  of  spectral 
modifications.  This  material  is  provided  primarily  for 
reference,  since  many  constant-Q  concepts  are  more  easily 
understood  by  analogy  with  constant  bandwidth  concepts. 

Chapter  3  then  presents  the  constant-Q  transform  in  a 
development  which  parallels  that  in  Chapter  2.  One  of  the 
family  of  possible  reverse  transforms  is  developed.  The 
effect  of  constant-Q  spectral  modification  is  discussed. 
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and  a  number  of  useful  transform  properties  are  given. 

Chapter  4  handles  a  collection  of  topics  which  do  not 
properly  fit  into  Chapter  3,  but  which  are  of  practical 
importance.  These  include  the  various  implementation 
issues,  such  as  sampling,  filterbank  design,  computation 
schemes,  and  the  nature  and  computation  of  the  constant-Q 
spectral  magnitude  and  phase  functions. 

Chapter  5  describes  the  use  of  the  transform  to  effect 
modification  of  the  rate  of  articulation  of  speech  (not  to 
be  confused  with  scaling  the  time  index)  .  A  comparison  is 
made  between  this  and  previous  work  with  this  problem. 


CHAPTER  2 


THE  SHORT-TIME  FOURIER  TRANSFORM 

2.1  Introduction 

The  mathematics  of  the  short-time  Fourier  integral 
transform  and  its  discrete  counterpart  have  been  clearly 
laid  out  in  a  number  of  standard  sources.  However,  because 
many  of  the  concepts  of  the  following  chapter  closely 
parallel  ideas  encountered  in  such  a  development,  a  brief 
review  of  the  continuous  forward  and  reverse  transforms, 
their  properties,  and  the  effect  of  spectral  modifications 
will  be  presented  here.  We  shall  also  find  it  convenient 
to  introduce  in  this  familiar  development  many  of  the 
symbols  and  terminology  used  throughout  this  work. 

2.2  Short-time  Fourier  Analysis 

The  continuous  Fourier  integral  transform, 

F  (to)  =  /  f(t)e“jut  dt  (2.1) 

and  its  inverse, 

f(t)  =  -  /  F(to)e^u)t  dto  (2.2) 

tr 

have  a  fundamental  limitation:  while  F ( o>)  has  infinitesimal 
frequency  resolution,  it  fails  to  provide  any  information 
about  how  frequency  information  varies  as  a  function  of 
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time.  The  concept  of  spectral  information  which  changes 
with  time  does  not  exist.  This  limitation  is  remedied  by 
weighting  the  time  signal  in  an  area  of  interest  using  a 
function  which  is  generally  smooth  and  which  has  limited 
non-zero  extent.  (Such  a  function  is  often  referred  to  as 
an  analysis  window.  The  Hann  window  is  a  familiar 
example.)  The  weighting  function  imposes,  by  reason  of  its 
non-zero  duration,  a  finite  time  resolution.  A  less 
obvious  effect  of  forcing  the  Fourier  integral  transform  to 


operate  locally  is 

tha  t 

the  frequency  resolution  of 

the 

transformed 

sig  nal 

is  no 

longer  infinitesimally 

fine, 

since 

it  has  been 

"smeared" 

by  convolution  with 

the  Fourier 

transform 

of 

the 

weighting  function. 

If 

this 

resol ut ion- 

1  im  i t ing 

window  is  allowed  to  slide 

al  ong 

the 

time  axis,  as  in  the  short-time  Fourier  integral  transform, 

F  (u),  t)  =  /  f  (T)h  (t-x)e_;iwT  dx  (2.3) 

the  transform  then  becomes  a  function  of  two  variables  and 
yields  local  (finite  resolution)  information  about  the 
input  signal  . 

2.3  Resolution  and  Sampling  Issues 

The  terms,  "time  resolution"  and  "frequency 
resolution,"  used  above,  require  a  more  formal  definition 


if  they 

are  to  be 

useful  in 

d i sc  reti za  tion 

of 

the 

short-time 

Fourier 

tr ansfo  rm . 

Suppo  se 

the 

time 

and 

frequency 

resolutions 

of  constant 

bandwidth 

analysis 

are 

defined  as  the  time  and  frequency  intervals  over  which  the 
window  function  and  its  Fourier  transform  are 
"significant."  Clearly  this  definition  involves  a  degree  of 
ambiguity  (and  therefore  appr  ox  imat  ion)  in  the 
identification  of  the  respective  intervals.  Hence,  a 
definition  is  adopted  here,  equivalent  to  that  used  by 
Allen  [13],  which  allows  precise  determination  of  the  time 
and  frequency  resolutions  associated  with  a  window 
function,  h(t).  The  finite,  non-zero  extent  of  h(t)  is 

defined  to  be  the  time  resolution,  T  ,  of  the  window.  If 

00 

by  H  (u) )  we  denote  the  Fourier  integral  transform  of  h(t), 
the  frequency  resolution,  ,  of  the  analysis  is  given  by 
the  extent  of  the  principal  interval  around  zero  wherein 
H(w)  is  positive-valued.  (The  subscripts  of  T  and  F  will 

oo  oo 

become  more  useful  in  Chapter  4.  They  indicate  the 
attenuation  of  the  window  or  its  transform  at  the  edges  of 
the  resolution  determining  interval.) 

The  time  scaling  property  of  the  Fourier  integral 
transform  [14]  guarantees  that  the  product  of  the  time  and 
frequency  resolutions  will  be  a  constant.  Hence,  we  may 
wr  i  te 


3c  =  T  F  (2.4) 

where  is  a  constant  whose  value  is  a  consequence  of  the 
choice  of  window  function  and  of  the  definitions  of  T  and 

OO 

F^  .  Easily  computable  values  for  are  given  in  Table  A.  1 
of  Appendix  A  for  a  few  common  windows  assuming  the 
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above-stated  definitions  of  T  and  F  . 

OO  CO 

Combined  with  the  celebrated  Nyquist  sampling  theorem, 
this  information  is  sufficient  to  permit  sampling  of  the 
short-time  Fourier  spectral  domain  without  loss  of 
information.  In  particular,  the  density  of  time  samples 
must  be  greater  than  F  ,  and  the  density  of  frequency 
samples  must  be  greater  than  T  .  Hence,  if  the  time  and 
frequency  sampling  intervals  are  respectively  At  and  Af, 

At  <  2ttF_1  (2.5) 

—  OO 

Af  <  (2trT  )_1  (2.6) 

-  OO 

Thus,  for  instance,  a  25.6  millisecond  Hann  window 
gives  rise  to  a  spectral  domain  which  must  be  sampled  at 
least  every  6.4  milliseconds  in  time  and  every  39.0625 
Hertz  in  frequency  if  information  is  not  to  be  lost. 

(It  should  be  noted  that  under  special  conditions  [15] 
restrictions  on  the  analysis  window  function  permit 
synthesis  from  undersampled  spectral  data.  In  general, 
however,  and  where  spectral  modifications  are  to  be 
performed,  synthesis  depends  on  proper  spectral  sampling  as 
defined)  , 

The  discrete  representation  of  the  short-time  Fourier 
transform  will  not  be  presented  here,  but  is  discussed  in 
several  sources  [7,10],  The  sampling  theorem  is  reviewed 
primarily  because  a  similar  argument  will  be  needed  in 
connection  with  the  constant-Q  transform,  and  because  the 


notion  of  analysis  at  discrete  frequencies  is  useful  in 
developing  the  following  analogy. 

Suppose  the  short-time  Fourier  transform  is  evaluated 
at  a  set  of  discrete  frequencies,  ui^  =  kAf.  If  for  each  k 
the  complex  exponential  in  2.3  is  associated  with  the  input 
function,  the  result  is  recognized  to  be  a  convolution 
(denoted  throughout  this  work  by  the  binary  operator,  ”*")  . 

F(wk,t)=f(t)e-'a)kt*h(t)  (2.7) 

In  this  form  of  the  analysis  expression,  the  short-time 
spectrum  at  any  oj^  is  recognized  to  be  a  lowpass  version  of 
the  complex  demodulated  input  signal.  A  simple  change  of 
variables  in  2.3  allows  the  complex  exponential  to  be 
associated  with  the  window  function  in  the  convolution. 
This  results  in  still  another  form  of  2.3. 

F(wk,t)  =  e'jukt  { f  ( t)  *  h(t)eju,kt}  (2.8) 

Because  the  various  bandpass  filters  resulting  from  the 

complex  modulation  of  h(t)  by  w  form  a  continuous  bank 

k 

this  form  of  2.3  has  been  called  the  filter  bank  analogy. 
It  is  shown  schematically  in  Figure  2.1. 

2.4  Short-time  Fourier  Synthesis 

The  nature  of  the  short-time  Fourier  synthesis 
integral  is  suggested  by  observing  that  the  analysis 
presents  at  every  instant  a  frequency-shifted  set  of 
contiguous  lowpass  representations  of  the  original,  and 


e~  Jc*JOt 


Figure  2.1.  Filterbank  analogy  to  short-time  Fourier 
analysis  and  synthesis.  Synthesis  via  the  filterbank  sum¬ 
mation  (FBS)  method  is  shown  here.  As  elsewhere,  v  is  the 
Fourier  frequency  parameter  associated  with  the  transformed 
spectral  time  axis,  and  each  is  an  analysis  center  frequen¬ 
cy  measured  along  the  w-axis. 


that,  re-shifted,  these  signals  should  add  up  to  produce  a 
scaled  version  of  the  original.  This  [.oposed  synthesis  is 
shown  schematically  in  Figure  2.1.  In  integral  form  the 
synthesis  is, 

f  ( t )  -=  — - -  /  F(cu,t)ej(Jt  du>  (P.9) 

27Th  (0) 

Equation  2.9  is  not,  however,  the  most  general  form  for 
short-time  Fourier  synthesis,  but  is  in  fact  a  particular 
case  of  the  more  general  form  given  below. 

fit)  =  1 _ /  F(u),t)g(t-T)eju)t  dw  dr  (2.10) 

2n  <g ( t) ,h(t)> 

where  <g(t),h(t)>  is  the  inner  product, 

<g ( t) , h { t) >  =  <g,h>  =  /  g(t)h(t)  dt  (2.11) 

of  g(t)  and  h(t)  ,  and  g(t)  is  a  window  function  having 
restrictions  similar  to  those  applying  to  the  analysis 
window,  h(t).  That  this  more  general  form  of  synthesis 
provides  an  anal ys i s- syn the  si s  identity  in  the  absence  of 
spectral  modification  is  shown  in  Appendix  B. 

The  significance  of  the  existence  of  a  more  general 
form  of  2.9  involving  a  synthesis  window  will  be  discussed 
in  Section  2.4.  It  suffices  for  now  to  identify  two 
members  of  the  family  of  synthesis  forms.  These  two  forms 
result  from  specifying  the  synthesis  window  to  be  either  of 
the  limiting  cases,  g(t)=<5(t)  or  g(t)=l.  (Here  and 
throughout  this  work  "<5(t)n  will  represent  the  Dirac  delta 
"function."  Although  technically  not  a  function,  the  Dirac 


delta  provides  notational  and  operational  short-cuts  when 
used  with  care.  Its  properties  and  use  are  described  by 
Papoulis  [14]  and  Lighthill  [16]).  In  the  first  case, 
where  g(t)=l,  2.10  reduces  to 

f(t)  =  //  F(w,r)e3wt  dw  dx  /  2t t  /  h  (  t )  dr  (2.12) 

If  the  window  area  is  constrained  to  equal  unity,  this 
expression  is  recognized  as  a  continuous  version  of  the 
overlap-add  (OLA)  synthesis  proposed  by  Allen  [13]. 

The  more  familiar  synthesis  expression,  2.9,  which  for 
reasons  described  above  is  called  the  filter  bank  summation 
( FBS )  synthesis  is  derived  from  2.10  by  setting  g(t)  =  6(t). 

2.5  Effect  of  Spectral  Modifications 

In  some  a ppl  ica t ions ,  where  parameter  extraction  is 
the  goal,  or  where  complete  spectral  information  is  to  be 
transmitted  over  a  noiseless  channel,  the  effect  of 
spectral  domain  modifications  is  not  important.  However, 
in  many  a ppl icat ions ,  modifications  to  spectral  information 
occur  either  unintentionally  or  as  a  main  feature  of  the 
processing  attempted.  In  these  cases,  the  effect  which 
spectral  domain  changes  have  on  the  synthesized  signal  must 
be  understood.  As  implied  in  Section  2.3,  the  OLA  and  FBS 
synthesis  integrals  yield  an  identity  when  cascaded  with 
the  short-time  analysis  of  2.3.  These  synthesis  integrals 
differ,  however,  in  their  effect  on  the  mapping  between  the 
spectral  domain  and  the  time  domain  if  spectral  domain 


signal  modifications  are  allowed.  The  reason  for,  and 
implications  of  this  behavior  are  the  subjects  of  the 
present  section. 

Recall  from  Section  2.2  that  the  short-time  Fourier 
transform  of  a  signal  has  finite  time  and  frequency 
resolution,  given  by  T  and  F  .  From  this  fact  alone,  it 

oo  co 

is  clear  that  the  set  of  short-time  Fourier  spectra  defined 
by  2.3  does  not,  for  any  given  h(t)  ,  include  every  possible 
compl  ex-val  ued  ,  two-dimensional  function.  In  other  words, 
the  mapping  performed  by  2.3  from  the  complex  line  to  the 
complex  plane  is  not  onto.  The  situation  is  shown 
graphically  in  Figure  2.2  where  the  shaded  area  is  the 
subplane  reachable  from  the  complex  line  via  the  short-time 
Fourier  transform  for  any  particular  window,  h(t).  The 
family  of  reverse  mappings  (called  left  inverses  or 
retracts)  specified  by  2.10  maps  the  whole  plane  onto  the 
line.  If  the  portion  of  the  plane  not  reachable  from  the 
line  could  be  excluded  from  our  interest,  simplicity  or 
computational  expediency  could  dictate  our  choice  of 
synthesis  integral  from  among  the  family  implied  by  2.10. 
However,  because  spectral  modification  often  attends 
spectral  domain  processing,  and  because  nearly  all  additive 
noise  as  well  as  many  useful  modifications  map  signals 
outside  the  subplane,  the  effect  of  various  retracts  on  the 
anal ysi s- synthesi s  system  in  the  presence  of  spectral 
modifications  is  important. 

While  many  types  of  spectral  modification  are  possible 


FUNCTIONS  ON  THE 
COMPLEX  PLANE 


Figure  2.2  The  effect  of  spectral  modification.  The 
arrows  between  the  line  and  the  plane  indicate  mappings 
available  using  the  short-time  (or  the  constant-Q)  forward 
and  reverse  transformations.  The  shaded  domain  delineates 


the  set  of  spectra,  F(w,t),  reachable  from  the  time  domain. 
Many  spectral  modifications  map  the  signals,  F(w,t),  to 
signals,  F'(w,t),  which  lie  outside  of  the  shaded  region. 
These  "illegal"  modifications  are  mapped  to  effective 
"legal"  modifications,  F"(w,t),  by  synthesis  followed  by 
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(convolutional,  multiplicative,  additive,  etc.),  the 
effects  to  be  expected  from  spectral  modification  and  the 
method  for  determining  such  effects  can  be  conveniently 
illustrated  by  considering  multiplicative  modification. 
The  general  relationship  between  an  intended  spectral 
modification  and  its  associated  effective  modification  (see 
Figure  2.2)  may  be  established  for  changes  of  a  particular 
form  by  substituting  2.3  into  a  version  of  2.10  which 
reflects  the  change  in  question. 

Suppose  F  ( a)  ,t )  in  2.10  is  multiplied  by  the 

time-varying  function,  C(w,t)  as  in  2.13. 

,£(t)  =  -  //  F  (u>,  x)C  ( w,  x)  g  (t- x)  e-^  du>  di  (2.13) 

2ir<g,h> 

Expanding  F(w  ,t)  in  2.13,  reassociating  factors,  and 
interchanging  the  order  of  integration  leads  to  the 
fo  1  lowi  ng  : 

£(t)  =  — - —  //  f (C)h(T-Ug(t-T)  *  (2.14) 

<g,h> 

-  C (w, t) ^  ^  du  d£  dx 

2tt 

In  this,  the  one-dimensional  inverse  Fourier  transform  of 
C ( u) » t )  e- w ^  ,  denoted  c(t-£,x),  is  recognized. 

Interchanging  the  order  of  integration  once  more  yields 

£(t)  =  -  /  f(£)  /  C(t-C,x)g(t-T)h(x-C)  dx  d£  (2.15) 

<g,h> 

If  the  inner  integral  is  rewritten  symbolically  as 
c(t-£,£),  2.15  becomes  the  superposition  integral, 
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£(t)  = 


<a.h>  ff(OC(t-i.a  dc ; 


(2.16) 


If  C(w , t )  is  momentarily  constrained  to  be  a  function  of 
only,  2.15  is  seen  to  be  a  simple  convolution,  a  result 
expected  from  the  convolution  property  of  the  Fourier 
transform.  If  the  time  dependence  is  readmitted, 
interpretation  of  2.15  becomes  more  difficult,  but  is 


possible  if  g(t)  is  sufficiently  constrained. 

As  examples  of  the  interpretation  of  2.15,  we  will 
consider  the  particular  cases  of  g  ( t)  mentioned  in  Section 
2.4.  Suppose,  for  instance,  that  g(t)=6(t)  as  in  FBS 
synthesis.  Then, 


C(t-C,C)  =  C ( t-£ , t) h (t-£ ) 


(2.17) 


The  intended  modification  has  been  time-limited  by  h(t) 
(blurred  in  the  frequency  domain)  but  is  seen  to  "take 
effect"  instantaneously  in  time.  This  result  matches  the 
behavior  described  by  Allen  and  Rabiner  [15]  for  spectral 
modifications  made  prior  to  FBS  synthesis.  In  the  case  of 
OLA  synthesis  (that  is,  when  g(t)=l),  2.16  becomes 

C(t-4,€)  =  /  C(t-C,T)h (t-C)  dT  (2.18) 

Note  that,  in  contrast  to  the  FBS  result,  the  intended 
modification  is  smeared  in  time  ( band- 1 im ited  by  h ( t) ) .  A 
more  intuitive  description  of  the  above  effects  is  taken  up 
in  the  second  section  of  Appendix  B. 
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The  implications  of  the  above  results  are  important. 
Although  the  FBS  and  OLA  forms  of  the  general  synthesis  of 
2.9  are  most  practical,  they  may  not  always  exhibit  desired 
spectral  modification  behavior  when  changes  are  to  be  made 
in  the  spectral  domain.  Also,  as  a  result  of  the  fact  that 
time  or  frequency  limiting  of  intended  mod  i  f  icat  ions  may 
occur,  smearing  in  the  Fourier  transform  domain  may  cause 
the  modification  function  to  extend  beyond  the  limits 
implied  by  the  spectral  time  or  frequency  sampling 
densities.  Hence,  care  must  be  taken  to  sample  densely 
enough  in  time  and  frequency  to  prevent  time  or  frequency 
aliasing  due  to  spectral  modificatii 
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CHAPTER  3 


THE  CONSTANT-Q  TRANSFORM  (CQT) 

3.1  Introduction 

The  purpose  of  the  present  chapter  is  to  introduce,  in 
a  development  similar  to  that  of  Chapter  2,  the  constant-Q 
transform.  In  this  development,  the  forward  and  reverse 
transforms,  their  interpretations,  the  effect  of  spectral 
domain  signal  modifications,  and  some  basic  transform 
properties  will  be  presented.  An  attempt  has  been  made  to 
describe  concepts  in  terms  which  facilitate  comparison  with 
similar  constant  bandwidth,  short-time  Fourier  transform 
concepts . 

3.2  Constant-Q  Fourier  Analysis 

Gambardella  [17,18]  has  proposed  a  generalized 
short-time  Fourier  analysis  integral  for  continuous  time 
signals  in  which  the  observation  window  is  a  function  of 
both  the  time  and  frequency  parameters  of  the  analysis. 

F  (w,  t)  =  /  f  (x)h  (t-T,w)e  -*<ot  dt  (3.1) 

The  conventional  short-time  Fourier  integral  transform  and 
the  standard  Fourier  integral  transform  can  be  considered 
to  be  special  cases  of  this  transform,  obtained  from  3.1 
when  h(t-t,u>)  equals  h(t-r)  or  when  h(t-t,u>)  equals  unity. 


2  3 

respectively.  However,  as  noted  by  Kajiya  [19],  a  very 
interesting  member  of  this  transform  family  arises  if  the 
window,  and  therefore  the  complex  transform  kernel,  is  a 
function  of  the  product  of  time  and  frequency.  For  any 
analysis  frequency,  the  resulting  transform's  window  length 
in  analysis  wavelengths,  and  therefore  the  number  of  cycles 
of  the  complex  kernel  sinusoid,  is  a  global  analysis 
constant.  Hence,  the  measurement  of  frequency  content 
obtained  by  integrating  the  product  of  the  sinusoid  and  the 
signal  is  always  the  result  of  estimation  over  the  same 
length  in  wavelengths  of  the  frequency  in  question.  This 
produces  the  time  and  frequency  resolution  effects  expected 
from  a  constant  percentage  bandwidth  transform.  To  see 
more  clearly  that  this  is  true,  the  constant-Q  transform, 
given  in  3.2, 

F  (u>,  t)  =  /  f  (x )  h  {  ( t-x )  w)  e  dr  (3.2) 

will  be  described  in  the  context  of  a  filterbank  analogy 
similar  to  that  used  in  Chapter  2  with  the  short-time 
Fourier  transform. 

Suppose  the  constant-Q  spectrum  is  sampled  at  a  set  of 
frequencies,  w  ,  whose  spacing  does  not  exceed  the  upper 

K 

limit  imposed  by  the  analysis  resolution  of  the  transform 
at  each  frequency.  (We  shall  have  more  to  say  about 
sampling  the  constant-Q  spectrum  in  Chapter  4).  Then,  if 
for  each  uj,  3.2  is  recognized  as  a  convolution,  it  may  be 

K 


written  as 
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F(wk,t)  =  f(t)e  3ojt  *  Mu^t)  (3.3) 

As  in  the  constant  bandwidth  case,  a  simple  change  of 
variables  in  3.2  leads  to  an  alternative  form  of  the 
analysis  integral. 

F(<*>  ,t)ejU)kt  =  f(t)  *  (3.4) 

K 

Fourier  transforming  both  sides  of  this  equation  (and 
invoking  the  convolution  property  of  the  Fourier  integral 
transform,  3.4  can  be  rewritten  as 

F  ^  ,  v-w^)  =  F  (  v>)  H  (  (  v-u)k)/wR)/  |  wk  |  (3.5) 

Here  v  is  the  Fourier  frequency  parameter,  and  F(v)  and 

H  (v )  are  the  Fourier  integral  transforms  of  f(t)  and  h(t), 

respectively.  Also,  F^(wk,v)  is  the  Fourier  integral 

transform  of  F(w,t)  with  respect  to  t  (again  the  Fourier 

frequency  parameter  is  v  ) .  The  right  hand  expression 
clearly  indicates  the  filterbank  behavior  of  the  transform. 
At  each  analysis  frequency,  the  input  signal  is  linearly 
filtered  by  a  basic  lowpass  filter  which  has  been  frequency 
shifted,  then  amplitude  and  frequency  scaled  by  the 
analysis  frequency,  u  .  If  each  filterbank  output  is 

K 

subsequently  frequency  shifted  by  -w  ,  the  result  is  that 

K 

given  by  3.4  and  shown  schematically  in  Figure  3.1.  The 
difference,  then,  between  the  filterbank  interpretations  of 
the  constant  bandwidth  and  constant-Q  transforms  lies  in 
the  frequency  stretching  and  amplitude  scaling  of  the 


bandpass  filters  of  the  constant-Q  filterbank.  This 

difference  is  crucial  however.  The  frequency  resolution, 

Fm(u)^)  ,  of  the  kth  analysis  filter  is  shown  in  3.4  to  be 

directly  proportional  to  its  center  frequency,  o,  .  A  bank 

of  such  filters  is  shown  in  Figure  3.2.  Cn  the  other  hand, 

the  temporal  extent,  T^iw^),  of  the  kth  analysis  filter  is 

seen  in  3.4  to  be  inversely  proportional  to  j  .  Hence,  the 

K. 

uncertainty  relation  which  governs  time  and  frequency 
resolutions  for  the  short-time  Fourier  integral  transform, 
also  governs  the  resolution  of  the  constant-0  transform, 
though  both  resolutions  are  fixed  in  the  former  case.  This 
difference  is  not  unexpected,  but  is  a  fundamental  stimulus 
for  a  study  of  the  constant-Q  transform  as  a  model  for 
auditory  analysis. 

3.3  Other  Schemes  for  Non-uniform  Bandwidth  Analysis 

The  importance  of  the  above  behavior,  although  it 
trivially  arises  from  3.2,  cannot  be  over- emphasized. 
Recognizing  the  fundamental  importance  of  non-uniform 
frequency  analysis,  several  schemes  have  appeared  in  the 
literature  by  which  Fourier  frequency  information  is 
sampled  at  frequency  intervals  which  become  wider  as 
frequency  increases.  A  very  simple  scheme  involves 
sampling  the  z-transform  at  non-un  i  fo  rm  ly  spaced  points 
along  the  unit  circle.  The  recognition  that  this  may  cause 
the  highest  frequencies  to  be  undersampled  suggests  the 
possibility  of  somehow  representing  local  unsampled 
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information  in  the  samples  which  are  taken.  This  is 
typically  attempted  by  computing  a  weighted  average  along 
the  frequency  axis  of  the  uniformly  sampled  short-time 
Fourier  transform  in  the  neighborhood  of  each  new  frequency 
sample.  Clearly,  the  reduced  frequency  resolution  of  each 
sample  fails  to  produce  the  additional  samples  required  by 
the  implied  increase  in  time  resolution,  so  that  unless  the; 
short-time  Fourier  transform  is  initially  oversampled  by 
the  amount  necessary  to  produce  adequate  time  resolution 
after  frequency  averaging,  the  information  surrendered  to 
the  average  is  lost  as  surely  as  if  no  averaging  had  been 
performed.  However,  with  proper  attention  to  sampling 
issues,  this  algorithm  can  be  shown  capable  of  producing 
results  equivalent  to  those  formalized  in  3.5.  Another 
method  is  that  explained  by  Oppenheim,  Johnson  and 
Steiglitz  [20]  wherein  a  sampled  input  function  is  passed 
through  a  unity  magnitude  shi  ft- invar  iant  network  which 
produces  another  sequence  whose  Fourier  transform  is 
related  to  the  Fourier  transform  of  the  original  sequence 
by  a  change  of  frequency  variable.  The  practical 
constraint  of  Fourier  transforming  the  modified  sequence 
using  a  finite  length  DFT  necessitates  windowing  the  time 
data.  This  windowing  corresponds  to  uniform  smearing  of 
samples  along  the  new  non-uniform  frequency  axis.  Hence, 
the  bandwidth,  or  frequency  resolution  of  each  frequency 
sample  is  related  to  its  center  frequency  and  to  the 
frequency  domain  distortion  function.  This  windowing  step 
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gives  the  method  the  appearance  of  constant-Q  analysis. 
However,  as  will  be  shown  in  Chapter  4,  a  properly  sampled 
constant-Q  spectrum  has  exponentially  spaced  samples. 
Unfortunately,  the  set  of  frequency  axis  distortions 
available  to  the  method  does  not  include  a  logarithmic 
tr ansfo rmat ion  .  Hence,  the  method  is  not  truly  constant-Q. 

Another  method,  due  to  Helms  [21],  approximates  the 
Laplace  transform  at  exponentially  spaced  frequency 
intervals,  and  produces  a  repr esentation  wherein,  as 
frequency  increases,  the  ratio  of  frequency  to  bandwidth 
increases.  However,  the  method  is  only  asymptotically 
constant-Q . 

3.4  Constant-Q  Fourier  Synthesis 

A  condition  necessary  to  the  general  usefulness  of  any 
analysis  scheme  is  that  the  analysis  be  reversible. 
Schemes  wherein  the  uncertainty  relation  is  violated  are 
destructive  of  information  and  hence  analysis  performed 
using  these  schemes  is  not  reversible.  The  reversibility 
of  the  constant-Q  transform  will  be  discussed  in  this 
section,  and  a  reverse  transform  given. 

As  for  the  short-time  Fourier  integral  transform  (and 
for  the  same  reason)  a  true,  two-sided  inverse  does  not 
exist  for  the  constant-Q  transform.  Rather,  a  family  of 
left  inverses  or  retracts  exist  which  map  the  subspace  of 
complex-val  ued ,  two-dimensional  functions  reachable  from 
the  complex  line  via  3.2  back  to  the  complex  line. 


^>0 

The  nature  of  one  member  of  this  family  of  reverse 
mappings  is  suggested  by  the  observation  that  in  the 
frequency  sampled  analog  of  3.5  the  various  filterbank 
outputs  are  frequency-shifted  outputs  of  a  bank  of 
contiguous  bandpass  filters.  To  be  sure,  the  filters  are 
not  of  uniform  width  or  amplitude;  however,  the  information 
is  all  there.  This  suggests  the  existence  of  a  continuous 
synthesis  integral  of  the  form 
1 

f(t)=-/F(w,t)e-|ll)  dw  (3.6) 

k 

wherein  the  various  complex  demodulated  filterbank  outputs 
of  3.4  are  simply  remodulated,  summed,  and  normalized  by  a 
constant,  k.  (The  value  of  k  will  not  be  defined  at  this 
point.  It  is  related  to  the  analysis  window.)  As  an  aid  to 
reader  intuition,  a  plausibility  argument  for  this 
synthesis,  also  referred  to  herein  as  filterbank  summation 
( FBS )  synthesis  will  be  given.  One  way  to  demonstrate  the 
overall  effect  of  filterbank  analysis  and  synthesis  is  to 
compute  the  frequency  response  of  the  entire 
anal  ysi  s- synthesi  s  system.  Referring  to  Figure  3.1,  the 
responses  of  the  various  filters  of  the  constant-Q 
filterbank  are  given  by 

<J>(u>,v)  =  H  (  (  v-w) /w )  /  |  w  |  (3.7) 

The  overall  response  of  the  filterbank  may  be  computed  as 
the  sum  of  the  component  responses.  Such  a  sum,  shown  in 
Figure  3.3  for  a  d  isc  rete- f  r  eque  ncy  a  nal  ys  i  s- synthesi  s 
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system,  is  expressed  symbolically  as 

$  ( v)  =  /  <j>(w,v)dv  =  /  H((v-o))/w)/|uj|  dco  (3.8) 

A  set  of  conditions  sufficient  for  the  existence  of  this 
integral  is  described  in  Appendix  C.  For  common  windows, 
such  as  the  Hanning  window  used  in  this  research,  the 
conditions  are  equivalent  to  the  requirement  that  the 
zero- frequency  component  passed  by  any  filter  of  a 
constant-Q  filterbank  be  null.  This  amounts  to  a  reduction 
of  the  set  of  allowable  values  for  0  to  integer  multiples 
of  some  minimum  value. 

Given  the  above  existence  conditions,  the  change  of 
variables,  u  =  am  and  v  =  av  (a>0),  leads  to 

$  (av)  =  /  H  (  (av-aw)/aw/ |  aw  |  a  dtu  (3.9) 

which  for  positive  reduces  trivially  to  $(v).  Since,  as 
shown  above,  the  value  of  $  ( av)  is  independent  of  a,  we 
must  conclude  that  the  value  of  the  filterbank  sum,  <l>(v), 
is  everywhere  a  constant.  Hence,  a  properly  constructed 
constant-Q  filterbank  responds  to  within  a  multiplicative 
constant  as  an  identity  system  when  its  outputs  are  simply 
summed.  The  actual  value  of  this  multiplicative  constant, 
k  (occurring  in  3.6),  is  ordinarily  difficult  to  derive 
either  analytically  or  numerically.  In  the  author's 
implementations,  empirical  determination  of  the  constant 
was  employed. 

At  this  point,  it  is  useful  to  note  that  the  above 
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FIGURE  3.3.  Di screte- frequency  constant-Q  filterbank 
esponse  (uniform  sampling).  The  passband  ripple  (b),  wh 
as  a  maximum  peak-to-peak  amplitude  of  about  °*2n?JnJ  S 
ppears  as  frequency  increases  due  to  the  oversampling 
hown  in  ( a ) . 


33 


synthesis  may  also  be  performed  along  a  logarithmically 
warped  frequency  axis  if  w$>(w,v)  is  used  as  the  integrand 
instead  of  <j>(uj,v)  .  (Such  a  filterbank  and  its  sum  are 
shown  in  Figure  3.4  for  a  discrete  set  of  frequencies.)  The 
proof  is  simple,  and  consists  of  noting  that  since 

1 

d  ( log  (oj)  )  =  -  du  (3.10) 

w 

we  may  rewrite  3.8  as 

Mv)  =  /  ux|>(v,w)  d(log(w))  (3.11) 

This  form  of  synthesis  is  significant  because,  as  will  be 
seen  in  Chapter  4,  the  scheme  by  which  the  constant-Q 
spectral  domain  is  minimally  sampled  uses  exponentially 
spaced  frequency  samples. 

Another  form  of  the  constant-Q  synthesis  has  been 
proposed  by  Ka j iya  [19]  in  connection  with  his 
two-dimensional  Mandala  transform  development.  This  more 
general  synthesis  form  is  interesting,  providing  insight 
into  the  nature  of  the  above  mentioned  family  of  retracts 
associated  with  constant-Q  analysis.  However,  the  limiting 
scheme  required  in  the  more  general  synthesis  translates 
less  simply  into  discrete  implementation.  Hence  the  analog 
to  short-time  FBS  synthesis  proposed  in  3.6  was  used  in 
this  research. 

3.5  Effect  of  Spectral  Modifications 

As  in  the  constant  bandwidth  case,  various  members  of 
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Figure  3.4  Di screte- frequency  constant-Q  filterbank 
(exponentially-spaced  sampling).  Note  the  linear  pre- 
emphasis  which  causes  the  amplitudes  of  the  filters  in  the 
filterbank  to  be  uniform.  Note  also  the  uniform  passband 
riople  of  about  0.4  db  peak-to-peak  magnitude. 
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the  family  of  reverse  transforms  map  points  in  the 

anal ysed- signal  space  which  are  not  reachable  via  the 
forward  transform  back  into  the  signal  space  differently. 
The  reason  for  and  nature  of  this  fact  should  be  clearly 
understood  before  processing  is  attempted  on  a  spectral 
domain  signal.  Unfortunately,  the  complexity  of  the 

forward  and  reverse  transforms  makes  derivation  and 
interpretation  of  such  information  difficult  for  most 

modification  types.  However,  an  example  indicating  the 
technique  of  such  a  derivation  and  showing  the 

interpretation  of  results  will  be  given  here. 

Assume  that  a  multiplicative  modification  which  is 
constant  in  time  is  to  be  applied  to  a  spectral  domain 
signal  prior  to  synthesis  via  FBS  synthesis.  Symbolically 
we  write 


f(t)  =  —  / F (u , t) G (w) dw 
k 


(3.12) 


f(t)  =  -  //  f  (t-x)h(wx)e"ja,(t“T)  dx  GiuOe^  dw  (3.13) 
k 


f(t)  =  —  //  f(t-x)  /  GCuOhfwxJe-^1  dw  dx 
k 


(3.14) 


f(t)  =  —  /  f(t-x)g(x)  dx 
k 


(3. 15) 


where 
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g(x)  =  /  G  ( w)  h  (  ojt  )  WT  dw  (3.16) 

Clearly,  the  effect  of  such  a  modification  is  to  filter  the 
input  signal  using  a  linear,  t ime- inv ar i ant  filter.  The 
nature  of  the  effective  filter,  however,  depends  on  not 
only  the  attempted  modification  function,  G(w),  but  as 
suggested  by  3.15  and  3.16,  is  determined  by  the  analysis 
window  as  well.  If  G(v)  is  used  to  denote  the  Fourier 
integral  transform  representation  of  g  (  t)  ,  the  following 
can  be  written: 

G  ( v )  =  /  g(x)e  -^VT  dx  (3.17) 

G  ( v)  =  //  G  (w)h  (ajx)e^“T  dw  e  -*VT  dx  (3.18) 

G  (  v )  =  //  G(w)H((v-w)/u  /  |  oo  [  dw  (3.19) 

The  effective  modification  is  the  result  of  a  stylized 

superposition  involving  the  intended  modification  and  the 
analysis  window  function.  This  operation  may  be  viewed  as 
a  weighted  sum  of  the  filters  in  the  filterbank.  The 
result  of  such  an  operation,  even  though  the  weighting 
function  may  have  arbitrarily  fine  frequency  resolution,  is 
constrained  to  have  frequency  resolution  which  is  dictated 
by  the  analysis  filterbank.  Hence,  effective  mod i fications 
are  constant-Q  versions  of  intended  modifications. 

The  situation  becomes  slightly  more  complicated  if  the 
intended  modification  is  allowed  to  vary  as  a  function  of 
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time  and  frequency.  This  new  multiplicative  modifier  is 
denoted  below  as  C(ui,t). 

f(t)  =  -  //  f  (t-x)h(u)T)e"jw(t_T)  dr  C(u>,t)ejlJt  dw  (3.20) 
k 

1 

f(t)  =  -  /  f(t-T)  C(x,t)  di  (3.21) 

k 

where 

C(x,t)  =  /  C  ( w ,  t )  h  (iiixle^1  du  (3.22) 

This  result  parallels  the  stationary  result  given  above, 
except  that  the  effective  filter  is  combined  by 
superposition  with  the  input  signal.  In  the  frequency 
domain, 

C  ( v , t)  =  /C(w,x)H((v-w)/w/|w|  dw  (3.23) 

Again,  the  intended  modification  acts  as  a  weighting 
function  on  the  various  filterbank  functions.  In  3.23 
however,  the  weighting  function  is  permitted  to  change 
instantaneousl y  in  time,  a  result  analogous  to  constant 
bandwidth  FBS  synthesis. 

3.6  Transform  properties 

This  section  states  or  points  out  the  absence  of  a  few 
properties  of  the  constant-Q  transform  which  are  analogous 
to  the  usual  Fourier  transform  properties.  In  what 
fol  lows  ,  define 
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f  (t)  -*■  F  (u) ,  t)  (3.  24) 

to  be  an  equivalent  statement  to  equation  3.1. 

3.6.1  Linearity  Property 

If  F^(u,t)  and  F2(w,t)  are  the  constant-Q  transforms 
of  f^(t)  and  f2  ( t)  ,  respectively  and  ,  a2  are  two 
arbitrary  constants,  then 

a  f  (t)  +  a2  f  2  a^F-^iujjt)  +  a^F^  (u>,t)  (3.25) 

The  proof  is  a  trivial  result  of  the  linearity  of  the 
integral  operator. 

3.6.2  Time  Scaling  Property 
If  a  is  a  real  constant  not  equal  to  zero, 

f  (  at)  -y  F  (w/a  ,  at)  /  |  a  | 

To  prove  this  property,  assume  that  ^(u,t) 

Constant-Q  transform  ( CQT )  of  f(at).  Then, 

£  U,t)  =  /  f  (ax)h  (  (t-x)w)e  -^a)T  dx 
With  a  change  of  variables, 

£(w,t)  =  /  f  (  x  )  h  (  (at-  x  )  w/a )  e  dx/|a| 

?(a),t)  =  F  (w/a,  at)/ ]  a  j 


(3.26) 

is  the 

(3.27) 

(3.28) 

(3.29) 


The  absolute  value  results  because,  for  less  than  zero, 
the  limits  of  integration  are  reversed  by  the  change  of 
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variable.  Notice  that  the  short-time  Fourier 
transform  does  not  share  this  property. 

3.6.3  Time  Shifting  Property 
If  t  is  a  real  constant, 

f(t-tQ)  -*  e  juJt0  F(oo,t-tQ) 

To  prove  this  property,  assume  that  £(00, t)  is  the 
f  ( t-t  )  .  Then  , 

£  (o>,  t)  =  /  f  (  r- tg  )  h  ( t- x  )  u )  e  dr 

With  a  change  of  variables, 

F(w,t)  =  e-^Wt0  /  f  (t  )  h  (u>  (  (t-tQ  ) -t  )  )  e  ^WT  dr 
F  ( u> ,  t )  =  e'jult0  F  (u) ,  t-tQ  ) 

3.6.4  Conjugate  Property 

If  by  superscript  the  operation  of 

conjugation  is  denoted,  and  if  we  assume  h(t)  to 
and  even, 

f*  (t)  ->  F*  (-w,  t) 

The  proof  is  as  follows: 

5*  (a),  t)  =  /  f  (x)h  (  (t-r)  oo)e  ■^U)T  dr 

(to,  t)  =  ((/  f*(T)h((t-T)w)e"juT  dx)*)* 


i  n  teg  r al 

(3.30) 

CQT  o  f 

(3. 31  ) 

(3.32) 

(3. 33) 

com  pi  ex 
be  real 

(3. 34) 

(3.35) 

(3. 36) 


Changing  variables 
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F*  ( w ,  t )  =  (/  f  (  t  )  b  {(t-T)ui)eJw'  dx) 

and  with  h  (<j(t-r))  =  h(-u>(t-T)), 

F(w,t)  =  F* (-u,t) 


(3. 37) 


(  3.  38) 


3.6.5  Symmetry  Properties 

If  f(t)  is  real,  and  if  h(t)  is  real  and  even,  then 
F(w,t)  is  conjugate  symmetric  with  respect  to  w  (that  is, 
FU,t)  =-F*  ( -  w ,  t )  )  . 

If  f  (  t)  is  imaginary,  with  h(t)  as  above,  then  F(j,t) 
is  conjugate  ant i- symmetr ic  with  respect  to  j  (that  is, 
F  ( u>  r  t )  =  -F  (-U ,  t )  )  . 

Both  of  the  above  follow  directly  from  linearity  and 
the  conjugation  property. 

3.6.6  Other  Properties 

The  independent  frequency  shifting  and  scaling 
properties  normally  associated  with  the  Fourier  integral 
and  short-time  Fourier  integral  transforms  do  not  have 
simple  constant-Q  counterparts.  Also,  the  convolution 
property,  absent  in  the  short-time  Fourier  transform,  does 


not  exist  for  the  COT. 


CHAPTER  4 


IMPLEMENTATION  OF  CCNSTANT-Q  ANALYSIS  AND  SYNTHESIS 

4.1  Introduction 

As  with  the  short-time  Fourier  transform,  the 
usefulness  of  the  constant-Q  transform  depends  on  the 
existence  of  theory  and  algorithms  which  enable  it  to  be 
applied  to  discrete  data.  This  need  has  been  met  in  the 
first  instance  by  the  discrete  Fourier  transform  and  by  the 
fast  Fourier  transform  (FFT)  algorithm.  This  chapter 
considers  the  problem  of  computing  discrete  forward  and 
reverse  constant-Q  transforms  of  discrete  time  data. 
Issues  not  involved  in  a  similar  discussion  of  constant 
bandwidth  transform  implementation  will  be  shown  to  arise. 
The  discrete  theory  and  algorithms  used  in  this  research 
will  be  presented,  as  will  comments  concerning  other 
possible  implementations. 

4.2  Sampling  the  Constant-Q  Spectral  Domain 

As  was  pointed  put  in  Section  2.2,  the  schemes  by 
which  the  short-time  spectral  domain  may  be  sampled  without 
loss  of  information  are  limited  by  the  analysis  window, 
which  imposes  a  constant  time  and  frequency  resolution  on 
the  spectral  information.  The  extension  of  the  thinking 
formalized  in  Section  2.2  to  the  problem  of  sampling  the 
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constant-Q  spectral  domain  is  complicated  by  the  dependence 
of  T  (w)  and  F  ( to )  on  frequency.  The  solution  of  this 
problem  necessitates  the  formalization  of  some  simple 
ideas.  First,  define  F^(w)  to  be  the  -3  decibel  frequency 
extent  of  the  analysis  window  function,  h ( t) ,  and  to  be 
the  constant  product  of  T  )  and  F  (w)  .  Then 

DO  _J 

£  =  T  (w)  F  ,  ( w)  (4  .  ]  ) 

3  oo  j 

Q  =  u>/F3  (u>)  (4.2) 

Thus,  we  have  an  explicit  relationship  among  frequency  and 
the  time  and  frequency  resolutions  at  that  frequency. 

T  (to)  =  0-sQ/w  (4.3) 

oo  j 

F  (w)  =  3  w/0^Q  (4.4) 

03  00  j 

Table  A.  1  of  Appendix  A  lists  values  of  for  a  few  common 
window  functions.  Equations  4.3  and  4.4,  combined  as  in 
Section  2.2  with  the  Nyquist  theorem,  give  rise  to  lower 
bounds  on  the  local  instantaneous  sampling  densities  along 
the  frequency  and  time  axes,  respectively.  Thus,  for 
example,  a  Hann  window  spectral  domain  whose  0  is  3.0  is 
minimally  sampled  with  a  frequency  interval  of  173  Hertz 
and  a  time  interval  of  1.4382  milliseconds  at  1000  Hertz. 
The  same  domain  at  50  hertz  must  be  sampled  at  least  every 
8.6914  Hertz  in  frequency,  but  only  28.  7641  milliseconds  in 
time.  In  general,  if  &t(u)  and  a  f  ( to )  are  respectively  the 
time  and  frequency  sampling  intervals  at  w. 
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A  t  (  v  ) 

<  2  n/F  (oj) 

-  CXi 

=  2^3Q/^" 

(4.5) 

Af  (oj) 

<  1/27TT  (w) 

-  oo 

=  u/2*  5-jQ 

(4.6) 

4.3  Design  of  a  Constant-Q  Filterbank 

Because  to  date  no  useful  analog  to  the  DFT  has  been 
discovered  for  the  constant-Q  transform,  the  sampling  and 
resolution  information  described  in  Section  4.2  must  be 
used  to  design  an  analysis  algorithm  which  ultimately 
invokes  discrete  convolution  to  simulate  at  selected 
frequencies  the  action  of  a  constant-Q  filterbank.  The 
character  of  an  algorithm  which  correctly  performs  this 
analysis  is  the  subject  of  the  present  section. 

As  implied  by  4.5  and  4.6,  minimal  sampling  along  both 
the  time  and  frequency  axes  is  performed  according  to  a 
non-uniform  scheme.  The  problem  of  determining  at  what 
points  the  continuous  spectral  domain  ought  to  be  sampled 
is  simplified  by  considering  the  domain  to  be  the  output  cf 
an  analog  filterbank.  It  is  then  necessary  to  specify 
individual  band  center  frequencies  and  bandwidths.  Figure 

3.4  shows  a  portion  of  an  idealized  filterbank  which 
minimally  samples  the  frequency  dimension  of  the  constant-Q 
spectral  domain.  The  relationship  between  adjacent  band 
center  frequencies  is  established  by  the  following 
observation,  given  that  the  window  function,  h(t),  is  real 
(the  demodulated  filter  magnitude  must  be  even): 


+ 


^k+r^k 
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(4.7) 


2  2 

(See  Figure  4.1.)  By  substituting  the  expression  for  the 
instantaneous  frequency  sampling  interval  given  in  4.6  and 
by  simple  algebraic  man i pul  a t i on ,  the  ratio,  R,  of  adjacent 
band  center  frequencies  may  be  determined. 


R  = 


“k+1  =  2Qt;3+1 

wk  2Qe3-i 


(4.8) 


This  ratio,  along  with  the  location  of  any  band  in  a 
constant-Q  filterbank,  determines  the  location  of  any  other 
band  as  foil ows . 


u,  =  w,  R  (4.9) 

k  k-n 

Hence,  an  analog  filterbank  which  performs  constant-Q 
analysis  using  a  minimal  set  of  bands  is  completely 
specified  by  defining  the  basic  window  or  filter  function 
(from  which  6^  is  determined)  ,  the  analysis  Q,  the  total 
analysis  bandwidth,  and  the  center  frequency  of  any 
a  na 1 ys i s  band  . 


4.4  Implementation  Details 

Until  this  point,  the  discussion  of  sampl  ing  has 
assumed  continuous  time  signals  and  a  finite  set  of  analog 
filters.  Thus,  only  the  frequency  dimension  has  been 
discretized.  Di  sc  rete- tim  -  implementation  of  the 

above-specified  filterbank  is  straightforward,  requiring 
application  of  digital  filter  design  and  biplexed  [22], 
fast  convolution  [23],  Careful  attention  must  be  given  to 


Figure  4.1.  Relationships  among  adjacent  filters  in  a 
mi n i ma 1 1 y- sa mpl ed  constant-Q  filterbank.  This  development 
assumes  real  analysis  window  functions  (which  have  even 
frequency- doma i n  magnitudes). 


issues  such  as  elimination  of  differences  in  the  linear 
phase  introduced  by  the  complex  modulation  of  the  various 
analysis  filters.  The  maximum  interval  over  which  the 
output  of  the  kth  band  of  the  filterbank  may  be  sampled  is 
that  given  by 

At(u>k)  =  2Tr63Q/6oouk  (4.10) 

In  practice,  all  analysis  channels  may  be  designed  to 
operate  at  a  common  sampling  frequency  which  is  greater 
than  or  equal  to  the  total  analysis  bandwidth.  Hence,  for 
analysis  of  a  segment  which  has  been  bandlimited  to  2u  and 
sampled  at  w^/rr  ,  the  total  computational  expense  is  equal 
to  the  sum  of  the  costs  of  the  individual  complex 
demodulations  and  fast  convolutions  for  each  analysis  band. 

Of  course,  this  uncomplicated  implementation,  shown  in 
Figure  4.2a,  is  wasteful  of  computational  resource.  That 
this  is  so  is  made  obvious  by  comparing  the  total  signal 
bandwidth,  2^h  ,  at  which  analysis  is  performed,  with  the 
bandwidths  of  the  highest  and  the  lowest  analysis  output 
bands,  and  ,  given  typical  analysis  parameters.  With 

a  third  octave  Hann  filterbank  whose  highest  channel  is 
centered  at  and  whose  lowest  channel  is  centered  at 

.  Olwjj,  the  portion  of  computation  performed  unnecessarily 
varies  between  54%  at  the  highest  channel  and  99%  in  the 
lowest,  with  the  average  waste  equal  to  88%.  Much  of  this 
unnecessary  computation  could  be  eliminated  by  bandlimited 
resampling  of  the  individual  complex  channel  signals  after 
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complex  demodulation  and  prior  to  (or  possibly  as  a  part 
of)  lowpass  filtering.  Rabiner  and  Crochiere  [24]  have 
described  an  efficient  algorithm  for  performing  bandlimited 
sampling  rate  reduction.  Their  method  is  a  multistage 
extension  of  the  technique  described  by  Schafer  and  Rabiner 
[25]  wherein  a  rate  change  by  a  rational  factor  is 
accomplished  by  using  a  single  operation  which  views  a  rate 
change  as  a  cascaded  interpolation  and  decimation,  and 
takes  advantage  of  the  bandlimiting  by  computing  only  the 
necessary  output  points,  and  by  avoiding  multiplications 
involving  zeros  in  the  input.  The  multistage  technique 
cascades  such  optimal  interpolators  and  decimators  to 
achieve  large  efficiency  improvements,  while  preserving 
linear  phase  and  reducing  much  of  the  finite  arithmetic 
error  associated  with  the  less  efficient  canonical  schemes. 
Moreover,  most  of  the  advantage  of  the  multistage  algorithm 
is  gained  with  only  two  stages  of  interpolation  and  two 
stages  of  decimation,  all  of  which  may  be  automatically 
designed  and  implemented.  Another  significant 

characteristic  of  analysis  using  such  a  system  is  that  the 
filtering  step  in  analysis  may  be  performed  using  a  single 
filter  rather  than  the  several  individual  analysis  filters 
suggested  by  equation  3.5.  This  scheme  for  constant-Q 
analysis  is  presented  in  Figure  4.2b.  As  in  the 
straightforward  implementation,  attention  must  be  given  to 
the  correction  of  phase  disparities  which  may  develop  among 
the  analysis  channels  as  the  result  of  non-zero  phase 


Figure  4.2.  Alternative  implementations  of  constant-0 
analysis  and  synthesis.  A  uniform  t i me- sampl i ng  approach 
is  shown  in  (a),  while  the  implementation  suggested  in  (b) 
allows  minimal  time  sampling. 
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resampling  filters  or  because  of  a  non-zero  phase  analysis 
filter  which  is  operating  on  signals  of  differing  sampling 
rates  . 

An  alternative  computat ional  algorithm  recently 

suggested  by  Kates  [26],  in  connection  with 
perception- related  analysis  of  loudspeaker  performance, 

achieves  computational  elegance  at  the  cost  of  the 
necessity  of  uniform  sampling  in  both  time  and  frequency. 
Kates'  s  simplification  consists  of  the  restriction  that 
window  functions,  h(t)  ,  be  decaying  exponentials.  If  we 
define  h(  t)  as , 

f  e'Ut  t>0  (4.11) 

h(t)  =  { 

[  0  t<0 

where  u  is  a  positive,  real  constant,  then  3.2  may  be 
written  as 

F(u>,t)  =  J  f  (T)e_liw(t~T)e_:iu,T  dx  (4.12) 

For  any  analysis  time,  t^,  it  can  be  shown  that  4.12  is 
equivalent  to 

F  ( w ,  t_ )  =  e~ju)t0  J  f  (t)  e"tw(j  +  li)  dt  (4.13) 

where  f  ( t)  =  f(t+t  )  .  This  is  easily  discretized  by 
to 
0 

periodically  sampling  f  ( t)  from  t=0  at  a  rate  at  least 

fc0 

equal  to  its  bandwidth  in  Hertz.  Then, 


-nv  ( 3+  ) 


P 


cbw 


e^ut0  F(e-iw,tn)  =  >  f  ( n )  e 

n=0  0 


(4.1/5' 


Notice  that  the  z-plane  has  been  evaluated,  not  along  the 
unit  circle,  but  along  the  spiral  given  by  z=e,'iJe-!  .  This 
particular  form  of  the  z-transform  is  im pi em en t ab 1 e  using 
the  chirp  z-transform  algorithm  [27]. 

Still  other  implementation  schemes  are  possible,  such 
as  that  currently  being  investigated  by  Tracy  L.  Petersen 
[28]  employing  HR  analysis  filters,  and  the  use  of 
c ha rg e- co upl ed  device  (CCD)  technology  which  promises 
significant  computation  speed  improvements  for  a  limited 
set  of  appl icat i ons . 


4.5  Minimum  Overall  Sampling  Rate 

The  non-uniform  minimal  sampling  scheme  described  in 

Section  4.4,  along  with  the  notion  presented  in  Section 

1.2,  that  constant-Q  analysis  resembles  more  clearly  than 

short-time  Fourier  analysis  the  dissection  of  sound 

performed  by  the  human  ear,  suggests  the  possibility  of  a 

difference  in  overall  sampling  rates  needed  to  represent 

the  respective  domains.  The  overall  rate  for  either 

spectral  domain  is  easily  derived  as  the  sum  of  the 

individual  rates  over  all  the  channels.  For  constant 

bandwidth  analysis  this  overall  rate,  P  ,  is 

cbw 

Pcbw  =  E  2pcbw  =  2Npcbw  ( 4  *  1 5 ) 

k=  1 

where  p.  is  the  rate  for  each  channel  .  Thus, 
cbw 


4.6  Computation  of  Constant-Q  Spectral  Magnitude  and  Phase 
The  final  topic  to  be  considered  in  this  chapter  on 
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the  theory  affecting  implementation  is  the  computation  of 
the  constant-Q  spectral  magnitude  and  phase  functions  from 
the  analysis  output.  That  the  spectral  domain  is  complex 
valued  is  obvious  from  equation  3.3  which  shows  F(u,t)  for 

any  analysis  frequency,  ,  to  be  the  output  of  a  linear 

k 

system  whose  input  is  a  complex  demodulated  real  signal. 
If  we  denote  the  real  and  the  imaginary  parts  of  the 
constant-Q  spectral  domain  as  follows, 

F(uk,t)  =  («k,t)  (4.22) 

then  the  spectral  magnitude  and  principal  value  phase 
functions  are  computed  in  the  usual  manner  as 


M(u>k,t)  =  {F(cjk,t)F*(Wk,t)  l15 

(4.23) 

tan-L{FI(a)k/t)/FR(wk,t)) 

fr^0' 

all  Fj 

9  (ojr  ,  t)  =  ■ 

tan-  1  (Fj  (uk,t)/FR{«k,t)  )+» 

V0' 

FI>0  (4.24) 

k  tan-I(FI (wk,t)/FR<uk,t) )-n 

50 

A 

o 

F  T  <  0 

It  should  be 

noted  that  the  above  non- 

1  inear 

ope  rat  ions 

inevitably  produce  signals  which  are  not  band-limited  when 
bandwidth  is  measured  in  terms  of  conventional  rectangular 
width.  Hence,  it  is  possible  that  magnitude  and  phase 
functions,  computed  from  ad equa tel y-sampl ed  complex  data, 
F(w,  ,t),  could  be  undersampled.  It  is  also  true,  however, 

K 

that  the  undersampling  of  the  magnitude  is  not  serious. 
Undersampled  areas  of  the  waveform  inevitably  occur  in  the 


magnitude  troughs  created  where  either  the  real  or  the 
imaginary  part  changed  sign.  Hence,  such  low  energy  areas 
contribute  but  a  small  fraction  of  the  total  spectral  mass. 
The  phase  function,  on  the  other  hand,  when  either  the  real 
or  the  imag  inary  part  becomes  very  small,  experiences  rapid 
movements  and  even  discontinuities  of  tt  ,  regardless  of 
efforts  to  bandlimit  it  by  oversampling.  The  problem  of 
correctly  estimating  the  spectral  phase  function  is  further 
complicated  by  "leakage"  through  the  side  lobes  of  the 
analysis  filter,  and  by  finite  arithmetic  error.  These 
problems,  compounded  by  the  fact  that  only  the  principal 
value  or  wrapped  phase  is  directly  computable  from  the  real 
and  imaginary  parts,  make  the  estimation  of  the  spectral 
phase  function  difficult,  particularly  in  the  broadband 
high  frequency  analysis  channels. 

A  number  of  techniques  for  the  estimation  of  the 
sampled  spectral  phase  function  were  investigated  during 
the  course  of  this  research.  Three  general  techniques, 
labelled  Methods  I,  II  and  III,  are  outlined  below. 

Method  I  for  constant-Q  spectral  phase  unwrapping 
circumvents  the  necessity  of  removing  the  2^  jumps  inherent 
to  the  principal  value  inverse  tangent  function  by  directly 
estimating  and  integrating  the  time  derivative  of  the 
phase.  The  phase  derivative  is  estimated  [1]  using  the 
property  that 
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0  (  ,  t ) 


ft  tan~1(F1/FR] 


F  F  -F  F 

i  _  R  I  I  R 

(wk't}  *  ~2 - 2 

F  +  F 
R  I 


(4.25) 

(4. 26' 


where  0  ( w  ,t)  represents  the  time  derivative  of  the  true 
k 

unwrapped  phase.  F  =F  (w  ,t)  and  F  =F  (w  ,t) ,  and  F  and 

R  R  k  Ilk  r 

F  are  the  respective  time  derivatives.  Then, 


0(uj  ,t)  =  /  t?(u)  ,  t)  dt 

0  K 


(4. 27) 


The  difficulty  with  this  otherwise  elegant  scheme  is  that 
in  a  sampled  implementation  the  derivative  and  integral  can 
be  computat  ional  ly  expensive  procedures  which  are,  at  best, 
subject  to  the  limitations  of  finite  arithmetic.  Hence, 
the  discrete  integral  may  drift  from  its  true  value.  To 
minimize  this  effect,  second  degree  interpolators  were  used 
to  estimate  the  sampled  derivative  and  integral  functions 
as  follows: 


FR(i)  =  (FR(i-2)-8FR(i-l)+8FR(i+l)-FR(i+2))/12  (4.28) 

Fjd)  =  (FI(i-2)-8FI(i-l)+8FI(i+l)-F  (i+2)  )/12  (4.29) 

® :  (i  )  =  Oj.  (i- 1)  +  (50J.  (i-1)  +80J  (i)-6  (i  +  1)  )/12  (4.30) 

where  ^  (  i )  =FR  («k  ,  iAt  (u>k  )  )  and  Fx  {  i)  =FI  (u>k  ,  iAt  (u  )  )  ,  and 
wh  e  r  e  , 


0  -j-  ( i  ) 


FnfiiFj  (i )  -F  T  (i  )  Fr  ( i  ) 
FR2  (i ) +FX 2  (i) 


(4.  31) 


A  second  general  method.  Method  II,  due  to  Portnoff 


[11],  assumes  the  accuracy  of  the  discrete  phase  difference 
as  an  estimate  of  i nstantaneous  frequency.  Given  this 
assumption  we  may  argue  that  because  each  analysis  channel 
signal  is  bandlimited,  the  instantaneous  frequency  of  each 
channel  must  be  similarly  bandlimited.  Thus,  if  a  channel 
is  sampled  at  at  least  twice  the  rate  implied  by  its 
bandwidth,  the  values  of  the  true  phase  difference  must 
fall  in  the  interval,  (-n/2,n/2).  To  unwrap  phase  under 
these  assumptions,  we  simply  add  or  subtract  integer 
multiples  of  n  to  the  wrapped  phase  difference  until  this 
condition  is  met,  and  then  calculate  the  unwrapped  phase  as 
the  running  sum  of  the  corrected  phase  differences.  An 
equivalent  form  of  this  method,  which  avoids  the 
accumulation  of  arithmetic  error  in  the  sum  is  given  by 


0  j  x  ( i  )  =  9p(i)  +  TT 


9II(i-1)-8p(i)|l 
n  2 


(4.32) 


where  e^(i)  and  6 ^ ^  (  i )  are  respectively  the  wrapped  phase 
and  the  estimate  of  the  true,  unwrapped  phase  for  a  given 
channel,  and  [xj  indicates  the  floor  of  x  (the  largest 
whole  integer  in  x)  .  The  difficulty  with  this  scheme  lies 
in  the  initial  assumption  which,  for  the  broadband  high 
frequency  constant-Q  channels,  fails  frequently. 

Still  another  method  which  was  investigated,  Method 
III,  utilizes  the  estimate  of  the  phase  derivative  as 
computed  in  Method  I,  in  conjunction  with  the  knowledge 
that  the  true  phase  can  differ  from  the  unwrapped  phase 
only  by  integer  multiples  of  r  .  Hence,  to  estimate  the 
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phase  at  any  point,  t.‘_?  phase  derivative  is  estimated  at 
that  point  and  integrated  to  produce  a  phase  estimate  from 
which  a  phase  difference  may  be  computed.  The  estimated 
phase  difference  is  then  added  to  the  phase  estimate  of  the 
previous  point  (assumed  now  to  be  correct)  .  This  new  value 
is  then  forced  to  the  nearest  value  which  differs  from  the 
wrapped  phase  by  an  integer  multiple  of  it  .  Formally,  this 
method  is  expressed  as, 


0III(i)  =  ep(i)+7r 


0TTT(i-l)+e. ( i ) - o _  (i-l)-e  (i)  lj 
-111 _ 1 _ I _ P _  +_  (4.33) 


where  e  (  i)  is  the  wrapped  phase,  9  (i)  the  Method  I 
P  I 

integrated  phase  derivative  estimate,  and  e  (i)  the 
Method  III  phase  estimate.  Clearly,  the  sources  of  error 
in  this  method  are  the  inaccuracy  in  estimating  the  phase 
derivative  function,  and  the  integration  of  the  phase 
derivative  across  the  interval  between  the  previous  point 
and  the  current  point.  The  cumulative  integration  errors 
inherent  to  Method  I  do  not,  however,  occur  in  this  method. 


Variations  of  the  three  methods  outlined  above  were 


all  found  to  perform  imperfectly.  The  assumption  made  in 
Method  II,  while  excellent  for  narrow  band  low  frequency 
channels,  was  inadequate  for  high  channels.  The  reason  for 
this  is  shown  in  Figure  4.3,  which  is  a  histogram  of  phase 
differences  computed  on  the  output  of  Method  I  for  a  high 
and  a  low  channel  (both  channels  were  oversampled  by  a 
factor  of  three).  A  signal  synthesized  after  unwrapping 
using  Method  I  contained  less  e r ror- ind uced  noise  than  the 
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Figure  4.3.  Channel  phase-difference  histograms.  To 
produce  these  histograms,  a  signal  was  upsampled  by  a  factor 
of  three,  analyzed  with  Q=11.5,  and  unwrapped  using  Method 
I.  Discrete  phase  differences  were  then  computed.  In  (a), 
computed  from  a  low-frequency  channel,  phase  differences 
are  concentrated  in  the  region  immediately  surrounding 
zero  frequency.  In  (b),  however,  phase  differences  extend 
beyond  the  interval  argued  as  justification  for  Method  II. 
The  histogram  (b)  was  computed  from  a  high-frequency 
channel . 
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signal  synthes i zed  after  unwrapping  via  either  Method  II  „r 
Method  III.  Method  III,  as  night  be  expected,  produced 
fewer  unwrapping  errors  than  Method  II,  resulting  in  the 
introduction  of  slightly  less  error- ind need  noise  than  was 
observed  the  use  of  the  Method  I,  unwrapper. 

Apparently,  the  ear  is  more  tolerant  of  gradual  phase  drift 
than  it  is  of  sudden  jumps.  In  the  author ' s  ex  per ience, 
such  gradual  changes  were  noticeable  only  as  a  low-level 
random  modulation  of  background  hiss.  Sudden  phase 

displacements,  on  the  other  hand,  gave  rise  to  a  -gurbling- 
noise,  the  subjective  intensity  of  which  increased  with  the 
greater  number  of  errors  committed  by  Method  II.  This 
noise  was  occasional ly  .large  enough  to  par t ial ly  mask  1  ow 
energy  stops  or  fricatives. 

4.7  Some  Additional  Notes  on  Phase 

It  should  be  noted  that,  for  most  applications,  it  is 
unnecessary  to  estimate  the  constant-Q  phase  function.  The 
spectral  phase  is  important  in  this  research  for  two 
reasons.  First,  its  correct  estimation  is  critical  to  the 
solution  of  the  rate  modification  problem  formulated  in 
Chapter  5.  The  second  reason  for  interest  in  the 

constant-Q  spectral  phase  involves  its  role  in  conveying 
perceptually  important  signal  information.  Callahan  [7] 
and  others  have  noted  that  in  speech  processing,  depending 
on  the  analysis  window  resolution,  the  short-time  spectral 
Phase  contains  mostly  excitation  information,  while  the 
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spectral  magnitude  is  dominated  by  vocal  tract  or  formant 
information.  Analogous  behavior  was  found  to  occur  in  the 
case  of  the  constant-Q  analysis.  This  condition  was  tested 
by  observing  the  results  of  syntheses  performed  using 
magnitude  only  (spectral  phase  set  to  zero)  and  phase  only 
(magnitude  set  to  unity).  In  listening  tests,  it  was 
observed  that,  for  analysis  selectiv ities  substantially 
larger  than  the  ears'  (0=11.5  for  example),  vocal  tract 
information  was  concentrated  principally  in  the  magnitude 
with  a  very  small  portion  appearing  in  the  signal 
reconstructed  from  phase  only.  Both  large  Q  syntheses 
contained  easily  distinguishable  pitch  information,  but  the 
phase  signal  was  found  to  be  the  dominant  carrier  of 
excitation  information.  Lowering  the  analysis  Q  to  be 
comparable  to  or  less  than  the  ears'  selectivity  increased 
the  dominance  of  vocal  tract  information  in  both  the 
magnitude  and  phase  only  reconstructions.  Hence,  for  low 
0,  the  magnitude-only  synthesis  contained  no  pitch 
information,  while  the  phase  synthesis  contained  both  vocal 
tract  and  excitation  information.  These  results  agree  with 
the  observations  made  by  both  Flanagan  [1]  and  Callahan  [7] 
that  narrow  bandwidth  channels  in  a  filterbank  force 
excitation  information  into  the  phase  signal,  while  wider 
bandwidth  channels  can  convey  this  information  via  the 
magnitude.  That  the  magnitude  contains  much  excitation 
information  for  low  Q  is  shown  in  Figure  4.4.  (in  this  and 
other  spectrograms  appearing  in  this  work,  light  intensity 
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is  proportional  to  the  preemphasized  spectral  magnitude, 

|  cjF  ( w  ,t)  j  .  Thus,  the  bright  areas  indicate  the  presence  of 
spectral  power,  while  the  darker  background  areas  occur 
where  less  activity  is  present.)  Note  the  strong  horizontal 
line  (and  its  harmonics)  corresponding  to  pitch,  as  well  as 
the  periodic  pi tch- related  variation  in  the  energy  of  the 
higher  frequency  structures  often  referred  to  as  formants. 
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CHAPTER  5 


TEMPORAL  AND  HARMONIC  SCALING 

5.1  Introduction  and  Background 

The  auditory  system  is,  next  to  the  visual  system,  the 
broadest  bandwidth  channel  available  for  communication  with 
the  human  mind.  Indeed,  when  comprehension  is  used  as  a 
measure,  evidence  [29]  suggests  that  the  auditory  channel 
may  exceed  the  visual  channel  in  its  usefulness  in 
information  transfer.  Yet,  as  anyone  can  observe,  the  mind 
is  capable  of  comprehension  rates  well  beyond  the  rate  at 
which  speech  is  normally  articulated,  and  even  beyond  the 
rate  at  which  it  can  accurately  be  produced  by  the  vocal 
tract,  which  has  a  practical  upper  limit  around  three 
hundred  words  per  minute.  In  recognition  of  this  fact, 
Fletcher  [30]  in  1929  experimented  with  increased  speech 
presentation  rate.  These  experiments  involved  simple  time 
scaling  of  the  speech  waveform  by  modifying  the  speed  of  a 
mechanical  playback  medium.  Fletcher  found  that  the 
accompanying  spectral  distortions,  the  scaling  of  the 
frequency  domain  by  the  inverse  of  the  time  domain  scaler, 
imposed  a  rather  narrow  limit  on  the  range  over  which 
speech  so-processed  is  intelligible.  Later  work  by 
Steinburg  [31]  confirmed  Fletcher's  basic  result  that 
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intelligibility  drops  off  rapidly  from  80  per  cent  for  tine 
scale  factors  outside  the  interval  0.7  to  1.2.  Fundamental 
understanding  was  later  applied  to  the  problem  by  Miller 
and  Licklider  [32],  who  recognized  the  existence  of 
redundant  information  in  the  speech  waveform,  particularly 
during  vowels  and  pauses.  Interested  primarily  in  taking 
advantage  of  this  redundancy  to  facilitate  tine 
multiplexing  of  speech  on  limited  bandwidth  channels,  they 
showed  that  periodic  deletion  of  segments  amounting  to  50 
per  cent  of  the  total  waveform,  if  performed  at  the  proper 
rate,  would  reduce  intelligibility  less  than  10  per  cent. 
This  information  was  soon  applied  by  Garvey  13  3]  to  the 
speech  rate  compression  problem.  Garvey  investigated  the 
possibility  of  concatenating  the  segments  created  by  Miller 
and  Licklider's  deletions,  thus  producing  a  time  signal 
which  could  be  substantially  shorter  than  the  original,  but 
whose  spectral  content  had  not  been  materially  changed. 
Performed  by  magnetic  tape  cut  and  splice,  Garvey's 
experiments  showed  better  than  90  percent  intelligibility 
for  compression  factors  as  high  as  2.5,  and  linear  decay  of 
intelligibility  to  40  per  cent  for  a  compression  of  4.0. 
This  successful  method  was  soon  automated  by  Fairbanks  [34] 
and  others,  and  remains  the  philosophical  basis  of  the  bulk 
of  compression  work  to  the  present.  More  recent  work 
includes  the  use  of  digitized  signals  which  can  be  easily 
manipulated  to  allow  p i tch- synchronous  splicing  of  segments 
(see  references  given  in  Chapter  1  of  Portnoff  [11]). 
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While  reducing  the  worst  of  the  artifacts  due  to  arbitrary 
splicing,  the  difficulty  of  tracking  pitch  accurately, 
particularly  where  noise  is  present,  can  cause  such 
algorithms  to  behave  poorly. 

Crude  as  these  methods  seem,  they  have  helped  define 
what  is  meant  by  speech  compression.  As  Fletcher  showed  at 
the  beginning,  a  speech  compression  algorithm  must 
delineate  between  speech  cha r ac te r i s t ic s  which  are 
perceived  in  time  and  those  which  are  perceived  as  having 
frequency  s ig n i f icance .  Furthermore,  as  noticed  by 
Fairbanks  and  others,  the  preservation  of  waveform 
intelligibility  requires  that  modifications  be  made  over 
distances  longer  than  fundamental  wavelengths,  but  shorter 
than  the  duration  over  which  the  harmonic  character  of  the 
signal  can  change.  The  division  of  information  into  what 
is  referred  to  herein  as  temporal  and  harmonic  information 
is  a  notion  familiar  to  builders  of  vocoders,  where 
bandwidth  is  greatly  reduced  by  extracting  and  transmitting 
slowly  varying  harmonic  information  as  a  function  of  time. 
Hence,  the  vocoder  is  a  natural  tool  for  speech 
com pression/ expansion .  In  a  vocoder  rate  change  system  the 
parameter  signals  produced  by  the  analysis  are  compressed 
or  expanded  in  time  prior  to  synthesis.  During  synthesis, 
the  implicit  harmonic  content  is  restored,  unaltered. 
Hence,  in  theory,  only  the  temporal  scale  is  modified. 
Probably  the  highest  quality  result  obtained  in  vocoder 
speech  compr  ession/ expansion  to  date  was  reported  by 
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absence  of  code  domain  modification  (such  as 
compression,  expansion,  noise  addition,  filtering,  etc.). 

5-2  The  Resolution  Issue  in  Compression/Expansion 

A  fundamental  issue  in  connection  with  vocoder 
compression/expansion  schemes  involves  the  fact  that  the 
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In  an  argument  in  favor  of  temporally  adaptive  rime 
resolution  in  vocoders,  Patisaul  and  Hammet  f  15]  conclude 
that 

...  .  there  is  no  optimum  compromise  in 

t ime- frequency  resolution.  Instead,  the 

'filter'  nature  of  the  hearing  process,  the 
extremes  in  the  articulatory  dynamics  of  speech 
production,  the  desire  for  the  validity  of  the 
stationary  model  and  the  concept  of  a 
t im e- f r eque ncy  cell  'matched'  to  the  signal 
suggest  that  the  shape  of  the  resolution 
rectangle  in  vocoder  spectrum  analysis  should 
be  adapted  to  the  signal.  fp.  1  298  ] 

Evidence  indicates  that  a  constant  bandwidth  analysis  is 

consistent  with  neither  the  human  auditory  system  in 

general  nor  with  a  correct  formulation  of  the  rate 

compression/expansion  problem  in  particular.  For  examne, 

recent  automatic  phoneme  recognition  work  by  Searle  [6] 

suggests  that  information  by  which  various  burst  and  stop 

phonemes  are  recognized  occurs  with  time  resolutions  finer 

than  20  msec,  and  probably  as  fine  as  5  to  10  msec.  The 

auditory  system,  on  the  other  hand,  hears  tones  with 

fundamentals  longer  than  20  msec.  Thus,  the  constant-Q 

transform,  which  maps  signals  into  a  two-dimensional  space 

where  time  and  frequency  resolutions  are  dependent  on 

analysis  frequency,  provides  a  more  natural  tool  for 

performing  independent  modifications  to  temporal  or 

harmonic  aspects  of  signals.  The  problem,  then,  of 

defining  what  portions  of  a  signal  ought  to  be  compressed 

or  expanded  in  a  speech  rate  change  system  is  at  least 

partially  solved  by  requiring  the  t ime- frequency  boundary 


to  be  a  variable  related  to  the  ear's  frequency-dependent 
boundary. 

5.3  Constant-Q  Tempo  r  al/Ha  rm on  ic  Com  pr  ess  io  n/ Ex  pan  si  on 

The  approach  to  rate  changes  taken  in  the  work 

reported  here  utilizes  a  property  of  the  constant-0 

transform  not  shared  by  the  short-time  Fourier  transform. 
This  property,  proved  in  Section  3.6.2  is  as  follows: 

f  (  a  t )  v  F  (  w/  a  ,  a  t )  /  |  a  j  (5.1) 

This  property  can  be  used  to  relate  a  change  of  scale  of 

either  the  temporal  or  the  harmonic  spectral  information  to 
a  change  of  scale  of  both  the  time  domain  signal  and  the 
other  spectral  axis.  Assume,  for  example,  the  possibility 
of  scaling  the  temporal  axis  of  the  constant-Q  spectrum  by 
a.  This  would  give  a  new  spectral  function,  F '  (w  ,t)  , 

F'  ( u) ,  t )  =  F  (ui ,  at)  2) 

If  the  signal,  f '  ( t)  ,  resulting  from  substituting  F'(w,t) 
into  5.1  were  time  scaled  by  1/a,  the  result,  F"  (w,t), 
using  the  constant-Q  time  scaling  property  would  be 

F"(w,t)  =  |  a  |F  '  (aw,  t/a)  +-+■  f'(t/a)  (5.3) 

This  may  be  written  in  terms  of  F(u,t)  as, 

F"(w,t)=|a|F(aw,t)  (5.4) 


Thus,  as  illustrated  in  Figure  5.1,  a  harmonically  scaled 


constant-Q  spectral  domain  is  related  to  a  temporally 
scaled  constant-Q  spectral  domain  hy  a  change  of  the 
signal's  time  scale. 

5.4  Implementation  of  a  Constant-Q  Com  pr  e  sso  r  /  Ex  pa  nd  er 

Because  of  the  relationship  explained  in  the  above 
section,  independent  scaling  of  either  the  temporal  or  the 
harmonic  axis  may  be  performed  if  one  or  the  other  is 
possible.  Both  methods  will  be  outlined  in  the  following 
sections,  which  review  the  issues  involved  in  temporal  or 
harmonic  seal  ing  . 

5.4.1  Temporal  Compression/  pansion 

Modification  of  the  scale  of  the  temporal  axis  of  the 

constant-Q  spectral  domain  can  be  performed  as  indicated  in 

Figure  5.2a.  In  this  block  diagram,  each  channel  output  of 

a  con  t  i  n  uo  us- t  ime  ,  discrete  frequency  analyzer  is  time 

scaled  by  prior  to  ordinary  synthesis.  As  shown,  this 

time  scaling  of  d  i  sc  r  e  t  e- t  ime  data  is  accomplished  by 

resampling  the  data  while  holding  the  implicit  sampling 

frequency  constant.  Thus,  for  example,  if  an  analyzer 

channel  output  is  represented  as  F(u>,  ,  iA  t  (u>  )  )  ,  the  time 

K.  K 

scaled  channel  data  may  be  written  as 

F^  ,  iAt  )  )  =  F  ,a \At  )  )  .  This  is  efficiently 
accomplished  for  any  rational  scalar,  a,  using  a  method 
such  as  that  described  by  Rabiner  and  Crochiere  [241  (see 
Section  4.4).  However,  because  each  channel  signal  is 
itself  subject  to  the  Fourier  scaling  property,  this 


and  the 


operation  modifies  each  channel  amplitude  by  1/a, 
bandwidth  of  each  channel  by  a  factor  of  a.  The  correction 
of  this  unwanted  effect  is  a  fundamental  issue  in 
independent  modification  of  the  spectral  axes,  and  will  be 
discussed  in  Sections  5.4.3  and  5.5.  For  now,  it  is 
important  to  insure  that  the  sampling  period,  At(w^),  is 
adequate  to  represent  the  modified  bandwidth  of 
Fa(  ,  i  At  (Wr)  )  without  aliasing.  If,  as  in  Figure  5.2a, 
the  analysis  output  is  sampled  at  the  original  signal 
sampling  rate,  and  if  F^w^)  is  less  than  the  signal 
sampling  frequency,  then  adequate  bandwidth  exists. 

Where  a  is  less  than  unity  (temporal  expansion),  each 
channel  will  occupy  less  bandwidth  as  the  result  of 
resampling,  obviating  the  above  condition. 


5.4.2  Harmonic  Compression/ expansion 

The  above  description  applies  to  the  process  of 
independent  temporal  axis  scaling.  If  the  goal  in  scaling 
the  temporal  axis  was  temporal  compression/ expansion ,  the 
synthesis  output  is  simply  reproduced  at  the  original 
sampling  frequency  with  the  necessary  lowpass  filter. 
Where  the  purpose  was  really  harmonic  axis  scaling,  the 
relationship  of  Figures  5.1b  and  5.1c  may  be  used.  The 
temporally-scaled  signal  is  simply  time  scaled  by 
reproducing  at  a  sampling  rate  equal  to  a  times  the 
original  rate.  This  produces  a  result  which  occupies  the 
original  time  duration,  but  which  has  been  scaled  along  the 


harmonic  axis  (the  overall  operation  is  sometimes  referred 
to  as  frequency  multiplication  or  division). 

Direct  modification  of  the  scale  of  the  harmonic  axis 
of  the  constant-Q  spectral  domain  by  is  accomplished  by 
complex  modulation  of  the  kth  channel  by  a  frequency  equal 
to  a-1  times  the  original  channel  center  frequency,  so  that 
the  shifted  cont inuo us- time  channel  signal,  F(ui,t),  is 
given  by 

F  (u>  ,t)  =  F(u).,t)ej  (a~1)“kt  (5.5) 

K  k 

A  schematic  representation  of  a  harmonic-axis  scalar 
appears  in  Figure  5.2b.  The  harmonic  shift  described  above 
does  not,  however,  modify  the  bandwidth  of  the  channel 
signal . 

5.4.3  Bandwidth  Scaling  by  Scaling  of  the  Phase-derivative 

The  effect  of  this  error  is  easily  seen  by  the 
following  example. 

Suppose  that  channels  of  a  continuous-time  constant-Q 
filterbank  appear  as  in  Figure  5.3a  as  they  relate  to  a 
complex  input  signal,  x(t), 

x  ( t )  =  ej“0t  {5*6) 

The  analysis  output  for  each  channel  can  be  written  as 


F(u>k,t) 


jw-t  -j<J>.  t 

okeJ  0  e  k 


(5.7) 


when 


Figure  5.3.  Illustration  of  the  effect  of  bandwidth 
mpensation  in  frequency  axi  s  seal  i  "J:  “5{jethe  9 
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V=  U> 


(5.8) 


harmonic  scaling  is  attempted  as  above  without  bandwidth 


compensation,  the  resulting  signal,  ^(t),  is  given  by 


-  l  F(wk»t)e 


j  (a~1)a,ktej“kt 


(5.9) 


Xau,(t)  =  l  awej  (W0+(a'1)wk)t 


(5. 10) 

As  illustrated  in  Figure  5.3b,  this  result  is  not  simply  a 
shifted  compl ex  sinusoid,  but  a  sum  of  scaled,  unequally 
shifted  sinusoids.  In  this  example.  the  problem  is  solved 
by  scaling  the  phase  of  the  analysis  output  by  „  prior  to 
shifting  and  synthesis.  Then 


Xaoo(t)  =  I  ake;i(a(a)0  “k *  (a_1)  ukte;i  “kfc 


(5.11) 


Xau>(t>  =  ejaV  l 


(5. 12) 


Since  we  require  the  magnitude  of  the  composite  filterbank 
to  be  unity,  we  may  write 


l  "k  -  1 

k  K 


(5.13) 


and 


*„„<*>  ■ 


(5. 14) 

Ah  shove,  in  Figure  5. 3c.  the  result  is  the  original  complex 
sinusoid  with  scaled  center  frequency. 

The  operation  by  which  the  bandwidth  was  scaled  in  the 
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above  example,  scaling  of  the  constant-0  spectral  phase,  is 
equivalent  to  that  suggested  by  Flanagan  ri]  in  connection 
with  a  compression/expansion  system  based  on  the  phase 
vocoder.  The  effect  of  scaling  the  constant-Q  spectral 
phase  is  understood  by  recognizing  that  it  equivalently 
scales  the  phase  derivative  (the  derivative  being  a  linear 
operator.)  The  phase  derivative  is  an  indication  of 
instantaneous  frequency,  so  that  scaling  this  quantity 
could  be  thought  of  as  scaling  the  bandwidth  (Appendix  D)  . 
A  single  channel  of  a  constant-Q  harmonic  scaler, 
including  bandwidth  expansion,  appears  in  Figure  5.2b. 

It  should  be  noted  as  above  that,  in  a  d i sc rete-time 
implementation,  adequate  bandwidth  should  be  allowed  in 
each  channel  signal  when  harmonic  expansion  is  performed. 

or  instance,  if  each  channel  analysis  output  were 
minimally  sampled,  bandwidth  expansion  by  any  factor  would 
cause  the  channel  signal  to  be  undersampled  (and  hence 
aliased)  by  that  factor.  Following  bandwidth  compensation, 
if  harmonic  scale  modification  was  the  goal,  the  resulting 
synthesized  signal  is  reproduced  at  the  original  sampling 
frequency  using  proper  anti-imaging  filters.  if  indirect 
temporal  scaling  was  tha  desired  end,  the  reproduce 
sampling  frequency  and  the  anti-imaging  filter  cutoff 
should  be  scaled  by  o. 

5.5  Bandwidth  Scaling  Issues 

Although  the  scaling  of  the  constant-Q  spectral  phase 
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time-derivative,  proposed  above  as  a  means  of  modifying 
channel  bandwidth,  was  used  in  this  research  with  good 
results,  another  approach  suggests  that  the  scaling  of  the 
phase  derivative  may  be  only  an  approximate  solution  to  the 
problem  of  bandwidth  scaling.  This  different  approach  will 
be  described  in  the  present  section. 

The  problem  of  scaling  the  bandwidth  of  a  complex 
analysis  output  channel  can  be  best  solved  if  (1)  a  model 
describing  such  signals  in  a  general  way  exists  and  i  f  (2) 
a  relation  exists  which  measures  bandwidth  in  terms  of  the 
model's  parameters.  Such  a  relationship  has  been  described 
by  Kahn  and  Thomas  [36]  for  amplitude  and  and  angle 
modulated  (AAM)  signals  of  the  form 

x(t)  =  M ( t ) °  (t)  (5.  15) 

In  this  model,  M(t)  is  an  amplitude-modulating  function 
equivalent  to  the  constant-Q  spectral  magnitude,  and  0 ( t ) 
is  a  phase  modulating  function  equivalent  to  the  constant-Q 
spectral  phase.  Kahn  and  Thomas  point  out  that,  in 
general,  the  modulating  functions  may  have  infinite 
bandwidth,  but  that  in  practice,  most  spectral  information 
is  concentrated  within  a  finite  band.  They  propose,  as  a 
useful  measure  of  this  bandwidth,  a  second  moment  measure 
of  the  spread  of  the  power  spectral  density.  If  by 
we  represent  the  power  spectral  density  function  of  x(t), 
and  if  F^(t,r)  equals  the  autocorrelation  function,  then 
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S  ( t ,  u)  =  /  R  ( t ,  i )  e  ^“JI  dx 

X  A 


(5.16) 


and 

Rx(t,T)=E{x(t)x(t-x)}  (5.17) 

where  E { }  denotes  mathematical  expectation.  The 

i nstantaneo us  bandwidth,  f>x(t)  ,  can  be  defined  by  the 
normalized  second  moment  as  follows: 


fix(t)  =  {/  o)2|Sx(t,u))  |2  d  oi  J*5/  {  /  |  Sx  ( t ,  a) )  |  2  do }  **  (5.18) 


This  may  be  rewritten,  using  Parseval's  theorem  and  the 
differentiation  property  of  the  Fourier  integral  transform 
[14],  as 


°x(t)  = 


«  Rx(t,t! 


6t 


}  /(Rx(t,x)  I’ 


x  =  0 


(5.19) 


nx(t)  =  |E{  ]x(t)  l  2  }/E  {  |x(t)  l2}  I  ** 


(5. 20) 


Substituting  x(t)  from  5.15,  and  assuming  M(t)  and  0  ( t)  to 
be  real-valued  and  differentiable,  and  E{M  ( t)  M(  t)  0  ( t)  }  =0  , 


Vt} 


E{M2 (t)+M2 (t) 0  2 ( t)  > 
E{M2 (t) } 


H 


) 


(5. 21) 


This  can  be  written,  in  light  of  5.19 


a  s 


8 


9  E{M2 ( t) e2 (t) } 
■M(t)+ - 2 - 


(5.22) 


An  interpretation  of  this  result  is  simply  that  the  total 
bandwidth  of  an  AAM  signal  has  components  which  are  due  to 
(1)  the  bandwidth  of  the  amplitude  modulating  signal,  and 
to  (2)  the  phase  modulating  signal.  In  the  case  where  the 
amplitude  modulating  signal  is  a  very  slowly  varying 
function,  5.22  depends  linearly  on  the  phase  modulating 
function,  and  the  a ppr ox ima t ion  presented  in  Section  5.4  is 
exact.  If,  however,  the  amplitude  modulation  portion  of 
the  bandwidth  is  a  significant,  but  not  dominant,  portion 
of  the  total  bandwidth,  simple  phase  scaling  will  lead  to 
inaccur atel y-scal ed  bands.  This  error,  in  cases  where 
bandwidth  is  being  expanded,  leads  to  misplaced  spectral 
data,  and  possible  spectral  holes,  giving  rise  in  extreme 
cases  to  an  effect  similar  to  comb  filtering.  To  determine 
a  corrected  factor  by  which  the  phase  derivative  should  be 
scaled,  assume  that  a  correct  bandwidth-expanded  signal, 
Ca  »  is  g  iven  by , 

C  (t)  =  M  ( t)  cos  { w,  t+0  (t)  }  (5.23) 

a  a  k  a 

and  that, 


M  (t)  = 
a 


a  ( t ) 


(5.24) 


where  ci^  and  a must  be  real.  Then, 


2  2  E{M2  (t)  }  +  a2E{M2 (t) e2 (t) 

a  -  2 

a  E{M  ( t) } 


(5.26) 


Note  that  has  no  effect. 


expression  for 
determ  ined  . 


into 


Substituting  the  equation  5.22 
5.26,  the  value  of  can  be 


a 2  =  a [l+ ( 1-a  2 ) E {M2  ( t )  } /E {M2 ( t ) 0 2 ( t )  }  ] *  (5.27) 


This  equation  implies  a  conditional  relationship  between  a 

and  a 2  •  When  expanding  bands  («>1),  the  number  by  which 

the  phase  derivative  must  be  scaled  to  scale  the  bandwidth 

by  a  is  greater  than  a.  When  contracting  bands  the 

opposite  is  true.  It  should  be  noted  that  when  the 

.  2  2 

amplitude  modulation  contribution,  E{M  (t)}/E{M  (t)}, 
dominates  the  total  bandwidth,  5.27  may  become  imaginary. 
This  effect  corresponds  to  a  lower  limit  on  the  total 
bandwidth  reduction  available  using  the  assumptions  of 
equations  5.24  and  5.25. 


5.6  Results 

The  procedure  outlined  in  this  chapter  for  the 
independent  time  or  frequency  scaling  of  signals  was  found 
to  produce  good  quality  results  over  a  range  of  scaling 
parameters.  Rate  compression  was  achieved  for  factors  up 


to  four,  at  which  point  intelligibility  was  degraded 
primarily  because  of  the  inability  of  the  mind  to 
assimilate  rapidly  enough.  It  was  also  noted  that  phonemes 
which  occur  naturally  over  very  short  intervals  tend  to 
disappear  at  high  compression  ratios.  This  phenomenon 
represents  a  fundamental  limit  to  the  uniform  definition  of 
compression  adopted  here,  which  can  be  circumvented  only  by 
selectively  compressing  features  in  a  non-uniform  manner. 
Such  an  investigation  is  beyond  the  scope  of  this  work. 

Expansion  experiments  were  successfully  performed  for 
factors  as  low  as  one-third.  The  speech  expansion 
experiments  revealed  two  fundamental  difficulties  in 
constant-Q  expansion. 

First,  for  values  for  Q  thought  to  be  comparable  to 
the  selectivity  of  the  human  auditory  system  (see  Section 
1.2),  the  constant-Q  transform  was  found  to  have 
time-resolution  at  high  frequencies  which  was  too  fine.  As 
shown  in  Figure  5.4a,  the  magnitude  of  high  frequency 
formant  information  appears  to  be  modulated  pitch 
synchronously.  The  result  is  that  when  the  time  scale  of 
the  high  bands  is  stretched,  and  the  bands  are  remodulated, 
a  false  modulation  of  these  bands  occurs  at  a  frequency 
equal  to  the  old  pitch  scaled  by  the  expansion  factor.  In 
expansion,  this  results  in  subjective  effects  best 
described  as  granularity  or  roughness  of  the  voice.  This 
effect  is  reduced  or  eliminated  by  increasing  the 
selectivity  of  the  analysis,  thus  reducing  the  fineness  of 
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the  transform's  time  resolution.  Spectrograms  with  reduced 
time  resolution  (increased  C)  are  shown  in  Figures  5.^b  and 
5.4c.  Temporally  expanded  synthesis  was  performed  using 
Q=19.  This  adjustment  corrected  the  granularity  problem 
without  introducing  any  undesirable  side  effects. 

The  second  difficulty  which  was  made  evident  by  the 
expansion  experiments  was  the  presence  of  artifacts  due  to 
spectral  phase  unwrapping  errors.  Such  errors  may  be 
eliminated  for  compressions  by  even  integer  factors  when 
unwrapping  methods  are  used  which  modify  the  wrapped  phase 
only  by  integer  multiples  o  f  tt  .  In  such  cases  any  error 
becomes  effectively  zero  (a  multiple  of  2  tt  )  during  the 
expansion  of  the  individual  bands.  In  speech  which  has  not 
been  expanded  by  even  integer  factors,  errors  of  this 
variety,  because  of  their  random  nature,  produce  a 
low-level  "gurbling"  sound.  This  effect  results  because 
the  phase  shifts  produce  random  constructive  or  destructive 
interference  among  overlapping  channels. 

Because  of  the  difficulty  in  properly  c  ha  r  ac  te  r  i  zi  ng 
the  spectral  phase  function,  tests  by  which  the  unwrappers 
mentioned  in  Section  4.6  could  be  evaluated  were  difficult 
to  design.  The  above-mentioned  property  of  expansion  by 
even  integer  factors  provided  one  good  test.  In  such  a 
test,  wrapped  and  unwrapped  spectral  phase  functions  were 
both  temporally  expanded,  synthesized,  and  compared.  Error 
other  than  integer  n  errors  appeared  as  differences  between 
the  two  syntheses.  Comparison  of  the  effect  of  expansion 


CHAPTER  6 


CONCLUS  ION 

6. 1  Summary 

The  potential  ability  of  a  constant  percentage 
bandwidth  transform  to  model  human  auditory  analysis  of 
sound  has  been  discussed,  as  have  previous  efforts  to 
simulate  such  analysis.  A  variation  of  the  Gambardella 
"multiple  filter  analyzer  integral,"  referred  to  herein  as 
the  constant-Q  transform  ( CQT )  has  been  shown  to  be  capable 
of  constant-Q  analysis  of  signals. 

The  major  contribution  of  this  work  consists  of  the 
establishment  of  the  CQT  as  an  understood,  theoretically 
sound  tool  for  audio  signal  processing.  To  this  end,  a 
synthesis  transform  has  been  proposed  which  performs  a 
reverse  mapping  from  the  constant-Q  spectral  domain  to  the 
time  domain,  and  which,  in  the  absence  of  spectral 
modifications,  forms  an  identity  system  when  cascaded  with 
the  analysis  transformation.  Existence  conditions,  and  a 
proof  of  the  validity  of  this  synthesis  transformation  have 
been  given.  Both  the  forward  and  the  reverse 
transformations  have  been  compared,  using  a  filterbank 
analogy  to  the  more  familier  short-time  Fourier  transform, 
and  similarities  and  important  differences  have  been 
pointed  out. 
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The  effect  of  scertrai  , 

spectral  modification  has  been 

established  for  bnfh  *.  u.  _ 

both  the  constant-Q  transform  and  for  a 

general  form  of  the  short-time  Pourier  transform.  ^ 

dependence  of  the  effect  of  spectra,  modification  upon  the 

anal ysis  window  has  been  established.  several  useful 

transform  properties  have  been  listed  and  proved. 

Principles  governing  the  sampling  of  the  cor  without 

1055  °f  Have  been  fo rmal i ted .  and  expl ,c it 

expressions  have  hpsn  j 

been  derived  which  relate  the 

chflrdctcristics  f  *.l  .  , 

analysis  window  with  the  CQT 

selectivity  and  with  a  minimal  COT  spectral  domain  sampling 
pattern.  An  alternative  logarithmically  warped  form  of  the 
synthesis  was  also  established  which  enables  minima, 

sampling  of  the  spectral  domain.  The  minimum  overall 

sampling  rate  for  the  mn^anf  o 

or  the  constant-Q  spectrum  was  shown  to  be 

equivalent  to  the  shnrp  ^ 

he  short-time  Fourier  transform  and  to  the 

time-domain  sampling  rates.  ^  practical 

anal ysis-synthesi s  algorithms  were  discussed  -  one  which 

•cHieves  nearly  minimal  spectral  sampling,  and  another 

"hlCh  aChUVeS  '  impl ementa t ion  at  the  expense  of 

over  sampling  in  time. 

The  nature  and  computation  of  the  constant-Q  spectral 
magnitude  and  phase  functions  has  beer,  investigated.  . 
method  for  unwrapping  phase  the  difficult  broad  band 
H.gh  channels  was  proposed  which  attempts  t,  perform  band 
limiting  correction  on  wrapped  phase  using  a  phase 
derivative  estimate  computed  directly  from  the  real  and 
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imaginary  parts.  The  algorithm  was  shown  empirically  to 
perform  better  than  two  other  more  common  methods. 

The  first  hig  h- resol  ut  ion  constant-Q  spectrograms  of 
speech  have  been  produced. 

Finally,  the  usefulness  of  the  CQT  in  actual  signal 
processing  has  been  demonstrated  by  application  to  the 
perception-related  problem  of  rate  modification  of  speech. 
Good  quality  modification  was  achieved  for  rates  between 
1/2  and  4.  Limitations,  some  inherent  to  the  notion  of 
rate  changing,  some  resulting  from  the  nature  of  a  COT 
implementation,  and  some  computat ional  ,  were  explained. 

6.2  Further  Research  and  Suggested  Applications 

The  CQT  has  been  established  as  a  well-defined  tool 


for 

aud  io 

processing  . 

Its  potenti 

al  usefulness,  however. 

appears  to 

be 

1 im ited 

pr  imar  ily 

by  lack  of  a 

fa  st 

com 
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to  the  FFT.  Processing 
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computing 

and  processing  minimally- 

•sampl  ed 

CQT 
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ral 

data 

is  an  area 

which  merits 

future 

inv 

estigati 

on. 

Other  general  areas  not  related  to  impl ementational 
and  computational  issues  are  naturally  suggested  by  the 
analogy  that  exists  between  the  CQT  and  the  peripheral 
auditory  system.  Noise  suppression  using  two-dimensional 
spectral  subtraction  or  thresholding  would  be  able  to  track 
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more  closely  in  time  rapidly  changing  high  frequency 
information,  while  better  resolving  in  frequency  the  low 
frequency  portions  of  a  signal.  This  frequency-specific 
resolution  might  reduce  effects  related  to  leakage  of  noise 
in  areas  where  the  ear  is  capable  of  fine  resolution. 

Acoustic  enhancement  experiments  analogous  to  the 
well-known  visual  enhancement  procedures  seem  promising. 
Constant  bandwidth  experiments,  performed  by  Callahan  using 
two-dimensional  techniques  suggest  the  potential  of  such 
experiments  if  performed  in  the  constant-Q  spectral  domain. 

Studies  by  Searle  [6]  indicate  that  a  transform  which 
more  closely  emulates  the  analysis  performed  by  the  human 
ear  could  be  advantageous  in  the  automated  recognition  of 
speech.  Information  essential  to  the  recognition  of  stops 
and  bursts  seems  to  occur  with  resolution  finer  than 
conventional  analysis  provides.  As  constant-Q  algorithm 
speed  improves,  a  system  based  on  two-dimensional 
constant-Q  recognition  merits  investigation. 

Finally,  the  similarities  between  the  distribution  of 
information  in  the  constant-Q  spectral  domain  and  auditory 
analysis  suggest  uniform  quantization  of  a 

minimally-sampled  constant-Q  spectrum  as  a  means  of 
reducing  overall  bandwidth  at  minimal  perceptual  expense. 


APPENDIX  A 


RESOLUTION  PROPERTIES  OF  WINDOWS 

The  scaling  property  of  the  Fourier  integral  transform 
indicates  a  reciprocal  relationship  between  the  scale  of 
events  measured  in  the  time  and  frequency  domains. 
Specifically,  if  h(t)  is  a  time  function  defined  over  all 
t,  and  if  the  integral. 

HU)  =  /  h(t)e-ju)t  dt  ( A.  1 ) 

exists,  then  it  is  true  that 

H  ( a)/ a )  /  |  a  |  =  /  h(at)e~^U>t  dt  (A. 2) 

If  we  define  some  characteristic  time  length,  T,  and  a 
characteristic  frequency  length,  F,  then  the  above 
relationship  guarantees  that  the  product,  8,  of  T  and  F  is 
a  constant. 

6  =  TF  (A*3) 

The  value  of  e  is  wholly  dependent  on  the  function,  f  { t)  , 
and  on  the  definitions  of  T  and  F.  This  property  provides 
a  simple  way  of  relating  the  time  and  frequency  resolution 
of  an  analysis  window.  We  adopt  the  convention  that 
resolutions  in  either  domain  will  be  measured  as  the  width 
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of  the  principal  interval  centered  on  zero  during  which  the 
function  is  attenuated  less  than  o  decibels  from  its 
maximum  value.  In  the  time  domain,  for  common  windows  such 
as  the  Fourier,  Hann,  Hamming,  Blackman  and  Bartlet 
windows,  we  pick  a  .  Thus,  is  the  total  non-zero 

length  of  a  window.  In  the  frequency  domain,  two  measures 
are  useful,  F  and  the  so-called  3  decibel  bandwidth,  F0 . 

oo  J 

From  these  we  may  define  and  compute  values  for  the 
analysis  window  resolution  products, 

8  =  T  F  (A. 4) 

CO  00  oo 

e3  *  t-f3  (a‘5) 

Table  A.  1  lists  values  of  g  and  g.,  for  the  windows 

oo  3 

mentioned  above.  Values  for  8  are  exact  and  can  be 

oo 

arrived  at  analytically;  values  for 
numerically. 


are  determined 


APPENDIX  B 


SUPPLEMENTAL  MATERIAL  RELATIVE  TO 
GENERALIZED  SHORT-TIME  FOURIER  SYNTHESIS 

B.l  Validity  of  Generalized  Short-time  Fourier  Synthesis 

The  continuous  short-time  Fourier  transform,  F(u>,t), 
of  a  function,  f  ( t)  ,  is  given  by 

F  ( u> ,  t )  =  /  f  (T)h(t-T)e"^wT  dt  (B.l) 

Allen  and  Rabiner  [15]  have  described  two 
commonly-understood  methods  for  synthesis  of  f(t)  from 
F(u,t).  The  two  syntheses,  the  filterbank  summation  ( FBS ) 
method  and  the  overlap-add  (OLA)  method  are  given  in  their 
continuous  forms  by  equations  B. 2  and  B. 3,  respectively. 

f(t)  =  /  F(u),t)e^w^  dw/2nh(0)  (B.2) 

f(t)  =  //  F(u),T)e^w^  du>  dt/2n  (B.3) 

The  analysis  maps  signals  of  the  form,  f(t),  from  the 
line  into  a  subclass  of  the  plane  in  such  a  way  that,  in 

the  absence  of  spectral  domain  modifications,  the  reverse 

mapping  can  be  made  with  perfect  fidelity  using  either  B.2 
or  B.3.  The  set  of  spectral  domain  signals  of  the  form. 


F(u,t),  which  is  reachable  via  B.l  is  restricted  in 
resolution  by  the  analysis  window,  h(t).  Signals  which  are 


92 


not  so-constrained  are,  nevertheless,  also  reverse  mappable 
onto  the  line  by  members  of  a  set  of  functions,  called 
retracts,  of  which  B.  2  and  B.3  are  examples.  The  way  in 
which  this  reverse  mapping  occurs  is  determined  by  the  form 
of  the  retract,  and  is  of  importance  when  considering  the 
effects  of  spectral  domain  modifications  which  violate  time 
or  frequency  resolution  constraints  placed  on  spectral 
domain  signals  by  h(t).  The  OLA  and  FBS  synthesis  do  not 
exhaust  the  possible  forms  of  a  more  general  class  of 
retracts  useable  for  short-time  Fourier  synthesis.  Rather, 
they  are  special  cases  of  the  general  retract, 

1 

f(t)  =  -  //  f  (u),  x)  g  ( t- 1)  e-1  °°  dw  dx  (B.4) 

2n<g,h> 

where  <g,h>  is  the  inner  product, 

<g,h>  =  /  g ( t) h ( t)  dt  (B.5) 

The  FBS  and  OLA  synthesis  are  easily  seen  to  be  special 
forms  of  B.4.  In  particular,  the  FBS  synthesis  is  obtained 
given  the  condition, 

g(t)  =  6  ( t)  (  B.  6 ) 

where  6 { t)  is  the  Dirac  delta  function.  Similarly,  the  OLA 
synthesis  is  obtained  if 

g (t)  =  1  ( B. 7 ) 

and  if  the  area  of  the  analysis  window  is  unity. 

We  now  show  that  B. 1  and  B.4  form  an 
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analysis-synthesis  identity  by  substituting  B. 1  into  B.4 
(renaming  the  result). 

f(t)  =  -  ///  f  (C)h(T-C)e~^a,?  d£  g(t--i)e:’utda)  dT{B.8) 

2»r<g,h> 

Modifying  the  order  of  integration,  the  complex 
exponentials  may  be  combined  and  isolated, 

-V  1  1 

f(t)  =  -  /  f(£)  /  g (t- t ) h (t-C) —  /  ejw(t_C)dudTdC(B.9) 

<g,h>  2n 

and  then  integrated, 

-v  1 

f(t)  =  -  /  f(UMt-S)  f  g  (t-x)h  (t-C)  dt  d£  (B.10) 

<g,h> 

With  the  change  of  variables,  y=t-t  , 

*  1 

f(t)  =  -  J  f(c)6(t-e)  /  g(w)h(t-e-u)  dii  dC  (B.  11) 

<g,h> 

If  we  then  define 

p(x)  =  /  g(p)h(x-u)  dp  (B.12) 


then  B.10  may  be  simplified  to 


1 

f  (t) 

=  - 

/  f U) «(t-?)p(t-e) 

d5 

(B. 13) 

<g,h> 

1 

f  (t) 

-  - 

f  (t)p(O) 

(B. 14) 

<g,h> 

Finally, 

we  recognize  from  B.12  that 

p(0)  is 

just  the  inner 

prod  uct , 

<g,h>. 

v» 

Hence  f(t)  =  f(t),  and 

the  validity  of 

B.4  as  a  retract  of  B.  1  is  shown. 


B.  2  Intuitive  Description  of  Short-time  Synthesis  Issues 

The  meaning  of  B.4  is  better  understood  by  performing 


** , 
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the  indicated  integration  with  respect  to  w.  This  gives, 

1 

f(t)  =  - /  F  (t,T)g(t-x)  dr  ( B. 15) 

<g,h> 

whereFt(t,x)  denotes  the  one-dimensional  inverse  Fourier 
integral  transform  of  F(w,t)  with  respect  to  its  first 
(frequency)  parameter.  An  expression  for  the  function, 
Ft(t,r)  ,  may  also  be  derived  from  B.  1  by  performing  an 
inverse  Fourier  integral  transform  along  the  w-axis.  This 
yields, 


F  t  ( t-  x )  =  f(t)h(t-r) 


(B. 16) 


Note  the  change  of  variables.  The  time  variable  of  the 
short-time  Fourier  transform  has  been  renamed  t  .  The 
function,  Ffc(t,^)  is  illustrated  in  Figure  B.)a  for  a  unit 
pulse  input. 


f  (t) 


lit  £2 
otherwise 


( B . 17) 


and  h(t)  is  is  a  Hann  window. 

Two  methods  of  synthesis  are  obvious  from  the  figure. 
The  first,  corresponding  to  the  FBS  synthesis  of  B.2  is 
achieved  by  evaluating  Ffc(t,x)  along  the  line  t=x  (i.e. 
g(t)=6(t)  in  B.  15. )  As  explained  in  Section  2.4, 
modifications  made  along  the  spectral  time  (t)  axis  are 
seen  to  "take  effect"  instantaneously  in  time  --  no 
time- resol ut ion  limiting  occurs.  One  could,  for  instance, 
time-limit  the  short-time  spectrum  at  z  =  l/7  and  find  that 


the  FBS  synthesis  had  been  similarly  truncated  (Figure 
B.  lb)  . 

The  other  method  of  synthesis,  the  CLA  synthesis  of 
B.  3,  is  achieved  by  integrating  (or  projecting)  F  _  (  t  ,T) 
along  the  t  axis.  This  corresponds  to  g(t)=l  in  B.15. 
Note  here  that  changes  to  the  short-time  spectrum  are 
limited  in  their  time  resolution  by  the  shape  of  the 
analysis  window.  For  instance,  an  attempt  to  time-limit 
the  spectrum  as  above  would  give  the  result  shown  in  Figure 
B.  lc . 

In  generalized  synthesis,  we  pick  g  ( t)  somewhere 
between  the  extremes  of  6  ( t )  and  1.  The  result  of  the 
convolution  of  B.15  then  causes  effective  changes  along  the 
time  (t)  axis  of  the  short-time  spectrum  to  be 
resol ution- 1 im ited  by  the  synthesis  window,  g ( t) . 

Suppose  now  that  the  spectral  modi  f  ications  for  the 
extreme  cases  above  occurred,  not  as  a  function  of  time 
(along  the  x  axis)  ,  but  in  frequency  (along  the 
transformed  t  axis)  .  In  this  case,  we  might  expect  the 
modified  Ft(t,r)  to  appear  as  in  Figure  B.  2a  .  Evaluating 
along  the  diagonal,  as  in  FBS  synthesis,  the  edges  of  our 
synthesized  pulse  cannot  "ring"  beyond  the  interval, 
(1/2, 3/2)  allowed  by  the  analysis  window  (see  Figure  B.2b). 
Hence  the  attempted  modification  is  time-limited  to  the 
dimension  of  the  analysis  window. 

However,  if  we  evaluate  by  integration  along  t  ,  as 
shown  in  Figure  B.2c,  no  t  ime- 1  im  it  ing  occurs,  and  the 
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modification  is  allowed  with  perfect  fidelity. 

Again,  by  picking  the  synthesis  window,  g(t)  somewhere 
between  6  ( t)  and  1,  the  extent  of  t  ime- 1  im  it  ing  of 
frequency  axis  modifications  is  controllable. 


APPENDIX  C 


CONDITIONS  FOR  THE  CONVERGENCE  OF  CONSTANT-Q  SYNTHESIS 

Because  analysis  window  functions  have  been  defined  to 
to  have  finite  non-zero  extent,  their  Fourier  transforms  do 
not.  Hence,  the  existence  of  the  synthesis  integral, 

<t>  ( v)  =  /  $(w,v)  dv  (C.l) 

where 

=  H((v-oj)/w)/|u)|  (C.2) 

(see  3.8)  cannot  be  guaranteed  on  the  basis  of  the 

integrand's  having  non-zero  value  over  a  finite  interval. 
To  establish  the  existence  of  C.l,  it  will  be  necessary  to 
determine  restrictions  on  h(t)  which  guarantee  the 
integral's  existence.  This  can  be  accomplished  for  a 
fairly  general  class  of  windows  using  the  following 
assumptions.  First,  assume  that  if  the  Fourier  integral 
transform  of  h(t)  is  H(x),  H(x)  can  be  bounded  above  for 
x  exclusive  of  the  interval,  (-2,0),  by  a^/|x+l|  for  some 
finite  constant,  .  This  is  the  case  for  many  windows 
having  finite  non-zero  time  extent,  since  their  transforms 
can  be  expressed  as  a  sum  of  shifted,  scaled  sin(x)/x 
functions.  It  is  true  in  particular  that  the  windows 
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mentioned  in  Table  A. 1  decay  as  1/x.  Second,  assume  that 
H(x)  can  be  bounded  by  a^jx+H  in  the  interval  (-2,0)  for 
some  finite  constant,  •  This  restriction  is  realized  for 
the  above-mentioned  windows  only  i  f  we  restrict  available 
values  of  0  to  a  discrete  set.  The  Hann  window,  as  shown 
in  Figure  C.l,  satisfies  this  criterion  when 


/  h(t)sin(t)  dt  =  0 


for  n=2,3,4,....  In  terms  of  Q, 


n6. 


Q  = 


e3  F.<V  463 


Q  =  . 6953114n  n  =  2,3,4, 


(C.  3) 


(C.4) 


(C.  5 ) 


Given  the  above  and  a  a  which  is  at  least  as  large  as  the 
maximum  of  and  ,  we  construct  a  function,  B(x),  which 
bounds  H  ( x)  as  shown  in  Figure  C.  2a  .  We  are  now  prepared 
to  examine  the  integ  rab  il  ity  of  C.l.  Notice  in  Figure  C.  2b 
the  effect  of  the  change  of  variables,  x=(v-w)/u>. 
Recognizing  that  B(x)  >  H(x)  implies 

B(  ( v-o) )/w)  >_  H((v-w)/u>),  we  conclude  that  the 

integrabil  ity  of  B(  ( v-w )  /w)  / 1  u>  |  will  imply  the 
integ  rabil  ity  of  H  ( ( v— u> ) /w)  /  j  u>  |  -  To  establish  the  latter 
implication,  we  integrate  the  bounding  function  piecewise 
as  indicated  in  Figure  C.  2b.  This  we  write  as 


Figure  C.2.  A  bounding  function  which  guarantees  the 
existence  of  the  constant-Q  synthesis  integral  for  a  Hann 
analysis  window.  The  bounding  function  is  indicated  by  the 
heavy  line  In  both  (a)  and  (b). 
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♦  (  v)  <  /  B(  (  v-,.)/.)/!  d»  (C.  6  ) 

which  simplifies  to 

*(v)  <  4a  (C.7) 

Thus,  our  rather  loose  bound  has  established  the  existence 
of  t(v)  for  window  functions  which  satisfy  the  criterion, 

|  H  (x)  |  <  B  (x)  Xt(  —  »>,<»)  ( C  8 ) 

where 


i 


B  (x) 


-o/(x+l)  x<-2 

-a(x+l)  -2£x<-l 

. 

o(x+l)  -1<x<0 
.  a/(x+l)  0<x 


for  some  finite  a. 


(C.9) 


APPENDIX  D 


THE  PHASE  DERIVATIVE  AS  A  MEASURE 
OF  INSTANTANEOUS  FREQUENCY  AND  BANDWIDTH 

A  general  complex  signal  may  be  represented  as. 


x ( t )  =  a(t)ej  <“t+*(t) > 


(D.  1) 


where  a(t)  and  <J>(t)  may  respectively  be  thought  of  as  the 
amplitude  and  phase  modulating  functions.  The  quantity. 


H(t)  =  «t+t(t)  (D. 2) 

provides  information  about  instantaneous  frequency  and 
overall  signal  bandwidth.  This  line  of  thought  is 
understood  by  observing  the  components  of  <j>  ( t)  as  in  Figure 
D.l.  As  seen  in  this  figure,  if  4,  ( t)  were  zero,  the 
frequency  of  the  complex  sinusoid,  e^wt,  would  equal  the 
the  slope,  u,  of  the  line,  wt .  For  <j>  ( t)  non-zero,  the 
slope  is  not  a  constant,  and  thus  the  "frequency"  of  the 
complex  sinusoid,  ej(«t+#(t))#  must  bfi  measured 
instantaneously  as, 


[u)t+$(t)]  =  u>+$(t) 


(D.3) 


Thus,  (t)  can  be  thought  of  as  an  instantaneous 
perturbation  of  the  center  frequency,  w  --  an  instantaneous 
frequency. 

Plotting  in  Figure  D.  2  the  function,  =  ( t)  =v+<j  (t)  ,  a 
further  interpretation  of  $(t)  is  apparent.  The 
instantaneous  frequency  of  x(t)  may,  for  many  practical 
x  ( t )  ,  appear  in  a  band  around  center  frequency,  u>  ,  whose 
width  is  a  function  of  the  range  of  $  ( t)  .  The  width  of 
this  band  may  be  thought  of  as  a  measure  of  the  bandwidth 
of  x  ( t)  .  Clearly,  scaling  ${  t)  (and  therefore  l  ( t)  )  has 
the  effect  of  similarly  scaling  the  width  of  this  band.  An 
alternative  to  this  measure  of  bandwidth  is  discussed  in 


Section  5 . 5 
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